"The Nature of Selection", Price 1971
submitted by /u/gwern
[link] [comments]
In the era of transfer learning, training neural networks from scratch is
becoming obsolete. Transfer learning leverages prior knowledge for new tasks,
conserving computational resources. While its advantages are well-documented,
we uncover a notable drawback: networks tend to prioritize basic data patterns,
forsaking valuable pre-learned features. We term this behavior "feature
erosion" and analyze its impact on network performance and internal
representations.
( 2
min )
It stands to reason that the amount and the quality of big data is of key
importance for setting up accurate AI-driven models. Nonetheless, we believe
there are still critical roadblocks in the inherent generation of databases,
that are often underestimated and poorly discussed in the literature. In our
view, such issues can seriously hinder the AI-based discovery process, even
when high quality, sufficiently large and highly reputable data sources are
available. Here, considering superconducting and thermoelectric materials as
two representative case studies, we specifically discuss three aspects, namely
intrinsically biased sample selection, possible hidden variables, disparate
data age. Importantly, to our knowledge, we suggest and test a first strategy
capable of detecting and quantifying the presence of the intrinsic data bias.
( 2
min )
This paper presents Fossil 2.0, a new major release of a software tool for
the synthesis of certificates (e.g., Lyapunov and barrier functions) for
dynamical systems modelled as ordinary differential and difference equations.
Fossil 2.0 is much improved from its original release, including new
interfaces, a significantly expanded certificate portfolio, controller
synthesis and enhanced extensibility. We present these new features as part of
this tool paper. Fossil implements a counterexample-guided inductive synthesis
(CEGIS) loop ensuring the soundness of the method. Our tool uses neural
networks as templates to generate candidate functions, which are then formally
proven by an SMT solver acting as an assertion verifier. Improvements with
respect to the first release include a wider range of certificates, synthesis
of control laws, and support for discrete-time models.
( 2
min )
This paper studies convergence rates for some value function approximations
that arise in a collection of reproducing kernel Hilbert spaces (RKHS)
$H(\Omega)$. By casting an optimal control problem in a specific class of
native spaces, strong rates of convergence are derived for the operator
equation that enables offline approximations that appear in policy iteration.
Explicit upper bounds on error in value function and controller approximations
are derived in terms of power function $\Pwr_{H,N}$ for the space of finite
dimensional approximants $H_N$ in the native space $H(\Omega)$. These bounds
are geometric in nature and refine some well-known, now classical results
concerning convergence of approximations of value functions.
( 2
min )
This paper focuses on the task of Extreme Multi-Label Classification (XMC)
whose goal is to predict multiple labels for each instance from an extremely
large label space. While existing research has primarily focused on fully
supervised XMC, real-world scenarios often lack complete supervision signals,
highlighting the importance of zero-shot settings. Given the large label space,
utilizing in-context learning approaches is not trivial. We address this issue
by introducing In-Context Extreme Multilabel Learning (ICXML), a two-stage
framework that cuts down the search space by generating a set of candidate
labels through incontext learning and then reranks them. Extensive experiments
suggest that ICXML advances the state of the art on two diverse public
benchmarks.
( 2
min )
Accurately predicting drug-drug interactions (DDI) for emerging drugs, which
offer possibilities for treating and alleviating diseases, with computational
methods can improve patient care and contribute to efficient drug development.
However, many existing computational methods require large amounts of known DDI
information, which is scarce for emerging drugs. In this paper, we propose
EmerGNN, a graph neural network (GNN) that can effectively predict interactions
for emerging drugs by leveraging the rich information in biomedical networks.
EmerGNN learns pairwise representations of drugs by extracting the paths
between drug pairs, propagating information from one drug to the other, and
incorporating the relevant biomedical concepts on the paths. The different
edges on the biomedical network are weighted to indicate the relevance for the
target DDI prediction. Overall, EmerGNN has higher accuracy than existing
approaches in predicting interactions for emerging drugs and can identify the
most relevant information on the biomedical network.
( 2
min )
We explore the abstract reasoning abilities of text-only and multimodal
versions of GPT-4, using the ConceptARC benchmark [10], which is designed to
evaluate robust understanding and reasoning with core-knowledge concepts. We
extend the work of Moskvichev et al. [10] by evaluating GPT-4 on more detailed,
one-shot prompting (rather than simple, zero-shot prompts) with text versions
of ConceptARC tasks, and by evaluating GPT-4V, the multimodal version of GPT-4,
on zero- and one-shot prompts using image versions of the simplest tasks. Our
experimental results support the conclusion that neither version of GPT-4 has
developed robust abstraction abilities at humanlike levels.
( 2
min )
The widespread integration of autoregressive-large language models (AR-LLMs),
such as ChatGPT, across established applications, like search engines, has
introduced critical vulnerabilities with uniquely scalable characteristics. In
this commentary, we analyse these vulnerabilities, their dependence on natural
language as a vector of attack, and their challenges to cybersecurity best
practices. We offer recommendations designed to mitigate these challenges.
( 2
min )
We applied a data-driven approach that explores the usability of the NetMob
2023 dataset in modelling mobility patterns within an urban context. We
combined the data with a highly suitable external source, the ENACT dataset,
which provides a 1 km x 1km grid with estimates of the day and night population
across Europe. We developed three sets of XGBoost models that predict the
population in each 100m x 100m grid cell used in NetMob2023 based on the mobile
data traffic of the 68 online services covered in the dataset, using the ENACT
values as ground truth. The results suggest that the NetMob 2023 data can be
useful for the estimation of the day and night population and grid cell level
and can explain part of the dynamics of urban mobility.
( 2
min )
The theory of statistical learning has focused on variational objectives
expressed on functions. In this note, we discuss motivations to write similar
objectives on measures, in particular to discuss out-of-distribution
generalization and weakly-supervised learning. It raises a natural question:
can one cast usual statistical learning results to objectives expressed on
measures? Does the resulting construction lead to new algorithms of practical
interest?
( 2
min )
Individualized treatment decisions can improve health outcomes, but using
data to make these decisions in a reliable, precise, and generalizable way is
challenging with a single dataset. Leveraging multiple randomized controlled
trials allows for the combination of datasets with unconfounded treatment
assignment to better estimate heterogeneous treatment effects. This paper
discusses several non-parametric approaches for estimating heterogeneous
treatment effects using data from multiple trials. We extend single-study
methods to a scenario with multiple trials and explore their performance
through a simulation study, with data generation scenarios that have differing
levels of cross-trial heterogeneity. The simulations demonstrate that methods
that directly allow for heterogeneity of the treatment effect across trials
perform better than methods that do not, and that the choice of single-study
method matters based on the functional form of the treatment effect. Finally,
we discuss which methods perform well in each setting and then apply them to
four randomized controlled trials to examine effect heterogeneity of treatments
for major depressive disorder.
( 2
min )
How do score-based generative models (SBMs) learn the data distribution
supported on a low-dimensional manifold? We investigate the score model of a
trained SBM through its linear approximations and subspaces spanned by local
feature vectors. During diffusion as the noise decreases, the local
dimensionality increases and becomes more varied between different sample
sequences. Importantly, we find that the learned vector field mixes samples by
a non-conservative field within the manifold, although it denoises with normal
projections as if there is an energy function in off-manifold directions. At
each noise level, the subspace spanned by the local features overlap with an
effective density function. These observations suggest that SBMs can flexibly
mix samples with the learned score field while carefully maintaining a
manifold-like structure of the data distribution.
( 2
min )
The estimation of categorical distributions under marginal constraints
summarizing some sample from a population in the most-generalizable way is key
for many machine-learning and data-driven approaches. We provide a
parameter-agnostic theoretical framework that enables this task ensuring (i)
that a categorical distribution of Maximum Entropy under marginal constraints
always exists and (ii) that it is unique. The procedure of iterative
proportional fitting (IPF) naturally estimates that distribution from any
consistent set of marginal constraints directly in the space of probabilities,
thus deductively identifying a least-biased characterization of the population.
The theoretical framework together with IPF leads to a holistic workflow that
enables modeling any class of categorical distributions solely using the
phenomenological information provided.
( 2
min )
Despite a great deal of research, it is still not well-understood why trained
neural networks are highly vulnerable to adversarial examples. In this work we
focus on two-layer neural networks trained using data which lie on a low
dimensional linear subspace. We show that standard gradient methods lead to
non-robust neural networks, namely, networks which have large gradients in
directions orthogonal to the data subspace, and are susceptible to small
adversarial $L_2$-perturbations in these directions. Moreover, we show that
decreasing the initialization scale of the training algorithm, or adding $L_2$
regularization, can make the trained network more robust to adversarial
perturbations orthogonal to the data.
( 2
min )
Urban traffic congestion remains a pressing challenge in our rapidly
expanding cities, despite the abundance of available data and the efforts of
policymakers. By leveraging behavioral system theory and data-driven control,
this paper exploits the DeePC algorithm in the context of urban traffic control
performed via dynamic traffic lights. To validate our approach, we consider a
high-fidelity case study using the state-of-the-art simulation software package
Simulation of Urban MObility (SUMO). Preliminary results indicate that DeePC
outperforms existing approaches across various key metrics, including travel
time and CO$_2$ emissions, demonstrating its potential for effective traffic
management
( 2
min )
We tackle in this paper an online network resource allocation problem with
job transfers. The network is composed of many servers connected by
communication links. The system operates in discrete time; at each time slot,
the administrator reserves resources at servers for future job requests, and a
cost is incurred for the reservations made. Then, after receptions, the jobs
may be transferred between the servers to best accommodate the demands. This
incurs an additional transport cost. Finally, if a job request cannot be
satisfied, there is a violation that engenders a cost to pay for the blocked
job. We propose a randomized online algorithm based on the exponentially
weighted method. We prove that our algorithm enjoys a sub-linear in time
regret, which indicates that the algorithm is adapting and learning from its
experiences and is becoming more efficient in its decision-making as it
accumulates more data. Moreover, we test the performance of our algorithm on
artificial data and compare it against a reinforcement learning method where we
show that our proposed method outperforms the latter.
( 2
min )
Large language models (LLMs) such as LLaMA and OpenAI’s GPT-4 are revolutionizing technology. However, one of the common complaints about LLMs is their speed, or lack thereof. In many cases, it takes a long time to get an answer from them. This limits LLMs’ applications and their usefulness in latency-critical functions, such as chatbots, copilots, […]
The post Skeleton-of-Thought: Parallel decoding speeds up and improves LLM output appeared first on Microsoft Research.
( 12
min )
Good news for car lovers: Two acclaimed auto shows, taking place now through next week, are delighting attendees with displays of next-generation automotive designs powered by AI. Hundreds of thousands of auto enthusiasts worldwide are expected to visit Guangzhou, China — known as the city of flowers — to attend its auto show, running through Read article >
( 6
min )
European startups will get a massive boost from a new generation of AI infrastructure, NVIDIA founder and CEO Jensen Huang said Friday in a fireside chat with iliad Group Deputy CEO Aude Durand — and it’s coming just in time. “We’re now seeing a major second wave,” Huang said of the state of AI during Read article >
( 7
min )
Europe’s startup ecosystem is getting a boost of accelerated computing for generative AI. NVIDIA and cloud service provider (CSP) Scaleway are working together to deliver access to GPUs, NVIDIA AI Enterprise software, and services for turbocharging large language models (LLMs) and generative AI development for European startups. Scaleway, a subsidiary of French telecommunications provider iliad Read article >
( 6
min )
Amazon Interactive Video Service (Amazon IVS) is a managed live streaming solution that is designed to provide a quick and straightforward setup to let you build interactive video experiences and handles interactive video content from ingestion to delivery. With the increased usage of live streaming, the need for effective content moderation becomes even more crucial. […]
( 9
min )
Generative AI models have the potential to revolutionize enterprise operations, but businesses must carefully consider how to harness their power while overcoming challenges such as safeguarding data and ensuring the quality of AI-generated content. The Retrieval-Augmented Generation (RAG) framework augments prompts with external data from multiple sources, such as document repositories, databases, or APIs, to […]
( 6
min )
From enhancing the conversational experience to agent assistance, there are plenty of ways that generative artificial intelligence (AI) and foundation models (FMs) can help deliver faster, better support. With the increasing availability and diversity of FMs, it’s difficult to experiment and keep up-to-date with the latest model versions. Amazon Bedrock is a fully managed service […]
( 7
min )
This paper explores the potential of the transformer models for learning
Granger causality in networks with complex nonlinear dynamics at every node, as
in neurobiological and biophysical networks. Our study primarily focuses on a
proof-of-concept investigation based on simulated neural dynamics, for which
the ground-truth causality is known through the underlying connectivity matrix.
For transformer models trained to forecast neuronal population dynamics, we
show that the cross attention module effectively captures the causal
relationship among neurons, with an accuracy equal or superior to that for the
most popular Granger causality analysis method. While we acknowledge that
real-world neurobiology data will bring further challenges, including dynamic
connectivity and unobserved variability, this research offers an encouraging
preliminary glimpse into the utility of the transformer model for causal
representation learning in neuroscience.
( 2
min )
Hyperparameters of Deep Learning (DL) pipelines are crucial for their
downstream performance. While a large number of methods for Hyperparameter
Optimization (HPO) have been developed, their incurred costs are often
untenable for modern DL. Consequently, manual experimentation is still the most
prevalent approach to optimize hyperparameters, relying on the researcher's
intuition, domain knowledge, and cheap preliminary explorations. To resolve
this misalignment between HPO algorithms and DL researchers, we propose
PriorBand, an HPO algorithm tailored to DL, able to utilize both expert beliefs
and cheap proxy tasks. Empirically, we demonstrate PriorBand's efficiency
across a range of DL benchmarks and show its gains under informative expert
input and robustness against poor expert beliefs
( 2
min )
In recent years, Reward Machines (RMs) have stood out as a simple yet
effective automata-based formalism for exposing and exploiting task structure
in reinforcement learning settings. Despite their relevance, little to no
attention has been directed to the study of their security implications and
robustness to adversarial scenarios, likely due to their recent appearance in
the literature. With my thesis, I aim to provide the first analysis of the
security of RM-based reinforcement learning techniques, with the hope of
motivating further research in the field, and I propose and evaluate a novel
class of attacks on RM-based techniques: blinding attacks.
( 2
min )
An ideal length-extrapolatable Transformer language model can handle
sequences longer than the training length without any fine-tuning. Such
long-context utilization capability relies heavily on a flexible positional
embedding design. Upon investigating the flexibility of existing large
pre-trained Transformer language models, we find that the T5 family deserves a
closer look, as its positional embeddings capture rich and flexible attention
patterns. However, T5 suffers from the dispersed attention issue: the longer
the input sequence, the flatter the attention distribution. To alleviate the
issue, we propose two attention alignment strategies via temperature scaling.
Our findings show improvement on the long-context utilization capability of T5
on language modeling, retrieval, multi-document question answering, and code
completion tasks without any fine-tuning. This suggests that a flexible
positional embedding design and attention alignment can go a long way toward
Transformer length extrapolation.
( 2
min )
Named Entity Recognition (NER) is essential in various Natural Language
Processing (NLP) applications. Traditional NER models are effective but limited
to a set of predefined entity types. In contrast, Large Language Models (LLMs)
can extract arbitrary entities through natural language instructions, offering
greater flexibility. However, their size and cost, particularly for those
accessed via APIs like ChatGPT, make them impractical in resource-limited
scenarios. In this paper, we introduce a compact NER model trained to identify
any type of entity. Leveraging a bidirectional transformer encoder, our model,
GLiNER, facilitates parallel entity extraction, an advantage over the slow
sequential token generation of LLMs. Through comprehensive testing, GLiNER
demonstrate strong performance, outperforming both ChatGPT and fine-tuned LLMs
in zero-shot evaluations on various NER benchmarks.
( 2
min )
Traffic simulators are used to generate data for learning in intelligent
transportation systems (ITSs). A key question is to what extent their modelling
assumptions affect the capabilities of ITSs to adapt to various scenarios when
deployed in the real world. This work focuses on two simulators commonly used
to train reinforcement learning (RL) agents for traffic applications, CityFlow
and SUMO. A controlled virtual experiment varying driver behavior and
simulation scale finds evidence against distributional equivalence in
RL-relevant measures from these simulators, with the root mean squared error
and KL divergence being significantly greater than 0 for all assessed measures.
While granular real-world validation generally remains infeasible, these
findings suggest that traffic simulators are not a deus ex machina for RL
training: understanding the impacts of inter-simulator differences is necessary
to train and deploy RL-based ITSs.
( 2
min )
Some stars are known to explode at the end of their lives, called supernovae
(SNe). The substantial amount of matter and energy that SNe release provides
significant feedback to star formation and gas dynamics in a galaxy. SNe
release a substantial amount of matter and energy to the interstellar medium,
resulting in significant feedback to star formation and gas dynamics in a
galaxy. While such feedback has a crucial role in galaxy formation and
evolution, in simulations of galaxy formation, it has only been implemented
using simple {\it sub-grid models} instead of numerically solving the evolution
of gas elements around SNe in detail due to a lack of resolution. We develop a
method combining machine learning and Gibbs sampling to predict how a supernova
(SN) affects the surrounding gas. The fidelity of our model in the thermal
energy and momentum distribution outperforms the low-resolution SN simulations.
Our method can replace the SN sub-grid models and help properly simulate
un-resolved SN feedback in galaxy formation simulations. We find that employing
our new approach reduces the necessary computational cost to $\sim$ 1 percent
compared to directly resolving SN feedback.
( 2
min )
Recognizing emotions in spoken communication is crucial for advanced
human-machine interaction. Current emotion detection methodologies often
display biases when applied cross-corpus. To address this, our study
amalgamates 16 diverse datasets, resulting in 375 hours of data across
languages like English, Chinese, and Japanese. We propose a soft labeling
system to capture gradational emotional intensities. Using the Whisper encoder
and data augmentation methods inspired by contrastive learning, our method
emphasizes the temporal dynamics of emotions. Our validation on four
multilingual datasets demonstrates notable zero-shot generalization. We publish
our open source model weights and initial promising results after fine-tuning
on Hume-Prosody.
( 2
min )
This study proposes a new approach that investigates differences in
topological characteristics of visual networks, which are constructed using
fMRI BOLD time-series corresponding to visual datasets of COCO, ImageNet, and
SUN. A publicly available BOLD5000 dataset is utilized that contains fMRI scans
while viewing 5254 images of diverse complexities. The objective of this study
is to examine how network topology differs in response to distinct visual
stimuli from these visual datasets. To achieve this, 0- and 1-dimensional
persistence diagrams are computed for each visual network representing COCO,
ImageNet, and SUN. For extracting suitable features from topological
persistence diagrams, K-means clustering is executed. The extracted K-means
cluster features are fed to a novel deep-hybrid model that yields accuracy in
the range of 90%-95% in classifying these visual networks. To understand
vision, this type of visual network categorization across visual datasets is
important as it captures differences in BOLD signals while perceiving images
with different contexts and complexities. Furthermore, distinctive topological
patterns of visual network associated with each dataset, as revealed from this
study, could potentially lead to the development of future neuroimaging
biomarkers for diagnosing visual processing disorders like visual agnosia or
prosopagnosia, and tracking changes in visual cognition over time.
( 3
min )
This article provides an understanding of Natural Language Processing
techniques in the framework of financial regulation, more specifically in order
to perform semantic matching search between rules and policy when no dataset is
available for supervised learning. We outline how to outperform simple
pre-trained sentences-transformer models using freely available resources and
explain the mathematical concepts behind the key building blocks of Natural
Language Processing.
( 2
min )
We study mean-field variational inference in a Bayesian linear model when the
sample size n is comparable to the dimension p. In high dimensions, the common
approach of minimizing a Kullback-Leibler divergence from the posterior
distribution, or maximizing an evidence lower bound, may deviate from the true
posterior mean and underestimate posterior uncertainty. We study instead
minimization of the TAP free energy, showing in a high-dimensional asymptotic
framework that it has a local minimizer which provides a consistent estimate of
the posterior marginals and may be used for correctly calibrated posterior
inference. Geometrically, we show that the landscape of the TAP free energy is
strongly convex in an extensive neighborhood of this local minimizer, which
under certain general conditions can be found by an Approximate Message Passing
(AMP) algorithm. We then exhibit an efficient algorithm that linearly converges
to the minimizer within this local neighborhood. In settings where it is
conjectured that no efficient algorithm can find this local neighborhood, we
prove analogous geometric properties for a local minimizer of the TAP free
energy reachable by AMP, and show that posterior inference based on this
minimizer remains correctly calibrated.
( 2
min )
Modeling is crucial to understanding the effect of greenhouse gases, warming,
and ice sheet melting on the ocean. At the same time, ocean processes affect
phenomena such as hurricanes and droughts. Parameters in the models that cannot
be physically measured have a significant effect on the model output. For an
idealized ocean model, we generated perturbed parameter ensemble data and
trained surrogate neural network models. The neural surrogates accurately
predicted the one-step forward dynamics, of which we then computed the
parametric sensitivity.
( 2
min )
Dynamic Item Response Models extend the standard Item Response Theory (IRT)
to capture temporal dynamics in learner ability. While these models have the
potential to allow instructional systems to actively monitor the evolution of
learner proficiency in real time, existing dynamic item response models rely on
expensive inference algorithms that scale poorly to massive datasets. In this
work, we propose Variational Temporal IRT (VTIRT) for fast and accurate
inference of dynamic learner proficiency. VTIRT offers orders of magnitude
speedup in inference runtime while still providing accurate inference.
Moreover, the proposed algorithm is intrinsically interpretable by virtue of
its modular design. When applied to 9 real student datasets, VTIRT consistently
yields improvements in predicting future learner performance over other learner
proficiency models.
( 2
min )
Approximate message passing (AMP) is a family of iterative algorithms that
generalize matrix power iteration. AMP algorithms are known to optimally solve
many average-case optimization problems. In this paper, we show that a large
class of AMP algorithms can be simulated in polynomial time by \emph{local
statistics hierarchy} semidefinite programs (SDPs), even when an unknown
principal minor of measure $1/\mathrm{polylog}(\mathrm{dimension})$ is
adversarially corrupted. Ours are the first robust guarantees for many of these
problems. Further, our results offer an interesting counterpoint to strong
lower bounds against less constrained SDP relaxations for average-case
max-cut-gain (a.k.a. "optimizing the Sherrington-Kirkpatrick Hamiltonian") and
other problems.
( 2
min )
Self-supervised representation learning often uses data augmentations to
induce some invariance to "style" attributes of the data. However, with
downstream tasks generally unknown at training time, it is difficult to deduce
a priori which attributes of the data are indeed "style" and can be safely
discarded. To address this, we introduce a more principled approach that seeks
to disentangle style features rather than discard them. The key idea is to add
multiple style embedding spaces where: (i) each is invariant to all-but-one
augmentation; and (ii) joint entropy is maximized. We formalize our structured
data-augmentation procedure from a causal latent-variable-model perspective,
and prove identifiability of both content and (multiple blocks of) style
variables. We empirically demonstrate the benefits of our approach on synthetic
datasets and then present promising but limited results on ImageNet.
( 2
min )
We study mean-field variational inference in a Bayesian linear model when the
sample size n is comparable to the dimension p. In high dimensions, the common
approach of minimizing a Kullback-Leibler divergence from the posterior
distribution, or maximizing an evidence lower bound, may deviate from the true
posterior mean and underestimate posterior uncertainty. We study instead
minimization of the TAP free energy, showing in a high-dimensional asymptotic
framework that it has a local minimizer which provides a consistent estimate of
the posterior marginals and may be used for correctly calibrated posterior
inference. Geometrically, we show that the landscape of the TAP free energy is
strongly convex in an extensive neighborhood of this local minimizer, which
under certain general conditions can be found by an Approximate Message Passing
(AMP) algorithm. We then exhibit an efficient algorithm that linearly converges
to the minimizer within this local neighborhood. In settings where it is
conjectured that no efficient algorithm can find this local neighborhood, we
prove analogous geometric properties for a local minimizer of the TAP free
energy reachable by AMP, and show that posterior inference based on this
minimizer remains correctly calibrated.
( 2
min )
Image Segmentation is one of the core tasks in Computer Vision and solving it
often depends on modeling the image appearance data via the color distributions
of each it its constituent regions. Whereas many segmentation algorithms handle
the appearance models dependence using alternation or implicit methods, we
propose here a new approach to directly estimate them from the image without
prior information on the underlying segmentation. Our method uses local high
order color statistics from the image as an input to tensor factorization-based
estimator for latent variable models. This approach is able to estimate models
in multiregion images and automatically output the regions proportions without
prior user interaction, overcoming the drawbacks from a prior attempt to this
problem. We also demonstrate the performance of our proposed method in many
challenging synthetic and real imaging scenarios and show that it leads to an
efficient segmentation algorithm.
( 2
min )
We consider a deep neural network estimator based on empirical risk
minimization with l_1-regularization. We derive a general bound for its excess
risk in regression and classification (including multiclass), and prove that it
is adaptively nearly-minimax (up to log-factors) simultaneously across the
entire range of various function classes.
( 2
min )
We study the training of deep neural networks by gradient descent where
floating-point arithmetic is used to compute the gradients. In this framework
and under realistic assumptions, we demonstrate that it is highly unlikely to
find ReLU neural networks that maintain, in the course of training with
gradient descent, superlinearly many affine pieces with respect to their number
of layers. In virtually all approximation theoretical arguments that yield
high-order polynomial rates of approximation, sequences of ReLU neural networks
with exponentially many affine pieces compared to their numbers of layers are
used. As a consequence, we conclude that approximating sequences of ReLU neural
networks resulting from gradient descent in practice differ substantially from
theoretically constructed sequences. The assumptions and the theoretical
results are compared to a numerical study, which yields concurring results.
( 2
min )
How do you train an AI to understand clinical language with less clinical data? Train another AI to synthesize training data. Artificial intelligence is changing the way medicine is done, and is increasingly being used in all sorts of clinical tasks. This is fueled by generative AI and models like GatorTronGPT, a generative language model Read article >
( 5
min )
Human analysts can no longer effectively defend against the increasing speed and complexity of cybersecurity attacks. The amount of data is simply too large to screen manually. Generative AI, the most transformative tool of our time, enables a kind of digital jiu jitsu. It lets companies shift the force of data that threatens to overwhelm Read article >
( 6
min )
3D artists can improve the productivity and efficiency of their generative AI-enabled content-creation workflows thanks to the latest updates to popular OpenUSD software.
( 7
min )
The fastest way to give the gift of cloud gaming starts this GFN Thursday: For a limited time, every six-month GeForce NOW Ultimate membership includes three months of PC Game Pass. Also, the newest GeForce NOW app update is rolling out to members, including Xbox Game Syncing and more improvements. Plus, take advantage of a Read article >
( 7
min )
This is a joint blog with AWS and Philips. Philips is a health technology company focused on improving people’s lives through meaningful innovation. Since 2014, the company has been offering customers its Philips HealthSuite Platform, which orchestrates dozens of AWS services that healthcare and life sciences companies use to improve patient care. It partners with […]
( 15
min )
Whisper is an Automatic Speech Recognition (ASR) model that has been trained using 680,000 hours of supervised data from the web, encompassing a range of languages and tasks. One of its limitations is the low-performance on low-resource languages such as Marathi language and Dravidian languages, which can be remediated with fine-tuning. However, fine-tuning a Whisper […]
( 7
min )
Determining the value of housing is a classic example of using machine learning (ML). In this post, we discuss the use of an open-source model specifically designed for the task of visual question answering (VQA). With VQA, you can ask a question of a photo using natural language and receive an answer to your question—also in plain language. Our goal in this post is to inspire and demonstrate what is possible using this technology.
( 12
min )
With the PockEngine training method, machine-learning models can efficiently and continuously learn from user data on edge devices like smartphones.
( 10
min )
Several recent works have studied the convergence \textit{in high
probability} of stochastic gradient descent (SGD) and its clipped variant.
Compared to vanilla SGD, clipped SGD is practically more stable and has the
additional theoretical benefit of logarithmic dependence on the failure
probability. However, the convergence of other practical nonlinear variants of
SGD, e.g., sign SGD, quantized SGD and normalized SGD, that achieve improved
communication efficiency or accelerated convergence is much less understood. In
this work, we study the convergence bounds \textit{in high probability} of a
broad class of nonlinear SGD methods. For strongly convex loss functions with
Lipschitz continuous gradients, we prove a logarithmic dependence on the
failure probability, even when the noise is heavy-tailed. Strictly more general
than the results for clipped SGD, our results hold for any nonlinearity with
bounded (component-wise or joint) outputs, such as clipping, normalization, and
quantization. Further, existing results with heavy-tailed noise assume bounded
$\eta$-th central moments, with $\eta \in (1,2]$. In contrast, our refined
analysis works even for $\eta=1$, strictly relaxing the noise moment
assumptions in the literature.
( 2
min )
This paper presents a policy parameterization for learning-based control on
nonlinear, partially-observed dynamical systems. The parameterization is based
on a nonlinear version of the Youla parameterization and the recently proposed
Recurrent Equilibrium Network (REN) class of models. We prove that the
resulting Youla-REN parameterization automatically satisfies stability
(contraction) and user-tunable robustness (Lipschitz) conditions on the
closed-loop system. This means it can be used for safe learning-based control
with no additional constraints or projections required to enforce stability or
robustness. We test the new policy class in simulation on two reinforcement
learning tasks: 1) magnetic suspension, and 2) inverting a rotary-arm pendulum.
We find that the Youla-REN performs similarly to existing learning-based and
optimal control methods while also ensuring stability and exhibiting improved
robustness to adversarial disturbances.
( 2
min )
Determining, understanding, and predicting the so-called structure-property
relation is an important task in many scientific disciplines, such as
chemistry, biology, meteorology, physics, engineering, and materials science.
Structure refers to the spatial distribution of, e.g., substances, material, or
matter in general, while property is a resulting characteristic that usually
depends in a non-trivial way on spatial details of the structure.
Traditionally, forward simulations models have been used for such tasks.
Recently, several machine learning algorithms have been applied in these
scientific fields to enhance and accelerate simulation models or as surrogate
models. In this work, we develop and investigate the applications of six
machine learning techniques based on two different datasets from the domain of
materials science: data from a two-dimensional Ising model for predicting the
formation of magnetic domains and data representing the evolution of dual-phase
microstructures from the Cahn-Hilliard model. We analyze the accuracy and
robustness of all models and elucidate the reasons for the differences in their
performances. The impact of including domain knowledge through tailored
features is studied, and general recommendations based on the availability and
quality of training data are derived from this.
( 2
min )
Partial monitoring is an expressive framework for sequential decision-making
with an abundance of applications, including graph-structured and dueling
bandits, dynamic pricing and transductive feedback models. We survey and extend
recent results on the linear formulation of partial monitoring that naturally
generalizes the standard linear bandit setting. The main result is that a
single algorithm, information-directed sampling (IDS), is (nearly) worst-case
rate optimal in all finite-action games. We present a simple and unified
analysis of stochastic partial monitoring, and further extend the model to the
contextual and kernelized setting.
( 2
min )
Slip and crumple detection is essential for performing robust manipulation
tasks with a robotic hand (RH) like remote surgery. It has been one of the
challenging problems in the robotics manipulation community. In this work, we
propose a technique based on machine learning (ML) based techniques to detect
the slip, and crumple as well as the shape of an object that is currently held
in the robotic hand. We proposed ML model will detect the slip, crumple, and
shape using the force/torque exerted and the angular positions of the actuators
present in the RH. The proposed model would be integrated into the loop of a
robotic hand(RH) and haptic glove(HG). This would help us to reduce the latency
in case of teleoperation
( 2
min )
Large Language Models (LLMs) have demonstrated superior performance in
language understanding benchmarks. CALM, a popular approach, leverages
linguistic priors of LLMs -- GPT-2 -- for action candidate recommendations to
improve the performance in text games in Jericho without environment-provided
actions. However, CALM adapts GPT-2 with annotated human gameplays and keeps
the LLM fixed during the learning of the text based games. In this work, we
explore and evaluate updating LLM used for candidate recommendation during the
learning of the text based game as well to mitigate the reliance on the human
annotated gameplays, which are costly to acquire. We observe that by updating
the LLM during learning using carefully selected in-game transitions, we can
reduce the dependency on using human annotated game plays for fine-tuning the
LLMs. We conducted further analysis to study the transferability of the updated
LLMs and observed that transferring in-game trained models to other games did
not result in a consistent transfer.
( 2
min )
The ability to interpret spoken language is connected to natural language
processing. It involves teaching the AI how words relate to one another, how
they are meant to be used, and in what settings. The goal of natural language
processing (NLP) is to get a machine intelligence to process words the same way
a human brain does. This enables machine intelligence to interpret, arrange,
and comprehend textual data by processing the natural language. The technology
can comprehend what is communicated, whether it be through speech or writing
because AI pro-cesses language more quickly than humans can. In the present
study, five NLP algorithms, namely, Geneism, Sumy, Luhn, Latent Semantic
Analysis (LSA), and Kull-back-Liebler (KL) al-gorithm, are implemented for the
first time for the knowledge summarization purpose of the High Entropy Alloys
(HEAs). The performance prediction of these algorithms is made by using the
BLEU score and ROUGE score. The results showed that the Luhn algorithm has the
highest accuracy score for the knowledge summarization tasks compared to the
other used algorithms.
( 2
min )
This study presents a physics-informed machine learning-based control method
for nonlinear dynamic systems with highly noisy measurements. Existing
data-driven control methods that use machine learning for system identification
cannot effectively cope with highly noisy measurements, resulting in unstable
control performance. To address this challenge, the present study extends
current physics-informed machine learning capabilities for modeling nonlinear
dynamics with control and integrates them into a model predictive control
framework. To demonstrate the capability of the proposed method we test and
validate with two noisy nonlinear dynamic systems: the chaotic Lorenz 3 system,
and turning machine tool. Analysis of the results illustrate that the proposed
method outperforms state-of-the-art benchmarks as measured by both modeling
accuracy and control performance for nonlinear dynamic systems under high-noise
conditions.
( 2
min )
The industry of quantum technologies is rapidly expanding, offering promising
opportunities for various scientific domains. Among these emerging
technologies, Quantum Machine Learning (QML) has attracted considerable
attention due to its potential to revolutionize data processing and analysis.
In this paper, we investigate the application of QML in the field of remote
sensing. It is believed that QML can provide valuable insights for analysis of
data from space. We delve into the common beliefs surrounding the quantum
advantage in QML for remote sensing and highlight the open challenges that need
to be addressed. To shed light on the challenges, we conduct a study focused on
the problem of kernel value concentration, a phenomenon that adversely affects
the runtime of quantum computers. Our findings indicate that while this issue
negatively impacts quantum computer performance, it does not entirely negate
the potential quantum advantage in QML for remote sensing.
( 2
min )
This paper proposes a metric to measure the dissimilarity between graphs that
may have a different number of nodes. The proposed metric extends the
generalised optimal subpattern assignment (GOSPA) metric, which is a metric for
sets, to graphs. The proposed graph GOSPA metric includes costs associated with
node attribute errors for properly assigned nodes, missed and false nodes and
edge mismatches between graphs. The computation of this metric is based on
finding the optimal assignments between nodes in the two graphs, with the
possibility of leaving some of the nodes unassigned. We also propose a lower
bound for the metric, which is also a metric for graphs and is computable in
polynomial time using linear programming. The metric is first derived for
undirected unweighted graphs and it is then extended to directed and weighted
graphs. The properties of the metric are demonstrated via simulated and
empirical datasets.
( 2
min )
Despite the significant interest and progress in reinforcement learning (RL)
problems with adversarial corruption, current works are either confined to the
linear setting or lead to an undesired $\tilde{O}(\sqrt{T}\zeta)$ regret bound,
where $T$ is the number of rounds and $\zeta$ is the total amount of
corruption. In this paper, we consider the contextual bandit with general
function approximation and propose a computationally efficient algorithm to
achieve a regret of $\tilde{O}(\sqrt{T}+\zeta)$. The proposed algorithm relies
on the recently developed uncertainty-weighted least-squares regression from
linear contextual bandit and a new weighted estimator of uncertainty for the
general function class. In contrast to the existing analysis that heavily
relies on the linear structure, we develop a novel technique to control the sum
of weighted uncertainty, thus establishing the final regret bounds. We then
generalize our algorithm to the episodic MDP setting and first achieve an
additive dependence on the corruption level $\zeta$ in the scenario of general
function approximation. Notably, our algorithms achieve regret bounds either
nearly match the performance lower bound or improve the existing methods for
all the corruption levels and in both known and unknown $\zeta$ cases.
( 2
min )
Partial monitoring is an expressive framework for sequential decision-making
with an abundance of applications, including graph-structured and dueling
bandits, dynamic pricing and transductive feedback models. We survey and extend
recent results on the linear formulation of partial monitoring that naturally
generalizes the standard linear bandit setting. The main result is that a
single algorithm, information-directed sampling (IDS), is (nearly) worst-case
rate optimal in all finite-action games. We present a simple and unified
analysis of stochastic partial monitoring, and further extend the model to the
contextual and kernelized setting.
( 2
min )
Vision Transformers (ViTs) with self-attention modules have recently achieved
great empirical success in many vision tasks. Due to non-convex interactions
across layers, however, theoretical learning and generalization analysis is
mostly elusive. Based on a data model characterizing both label-relevant and
label-irrelevant tokens, this paper provides the first theoretical analysis of
training a shallow ViT, i.e., one self-attention layer followed by a two-layer
perceptron, for a classification task. We characterize the sample complexity to
achieve a zero generalization error. Our sample complexity bound is positively
correlated with the inverse of the fraction of label-relevant tokens, the token
noise level, and the initial model error. We also prove that a training process
using stochastic gradient descent (SGD) leads to a sparse attention map, which
is a formal verification of the general intuition about the success of
attention. Moreover, this paper indicates that a proper token sparsification
can improve the test performance by removing label-irrelevant and/or noisy
tokens, including spurious correlations. Empirical experiments on synthetic
data and CIFAR-10 dataset justify our theoretical results and generalize to
deeper ViTs.
( 2
min )
Approximate inference methods like the Laplace method, Laplace approximations
and variational methods, amongst others, are popular methods when exact
inference is not feasible due to the complexity of the model or the abundance
of data. In this paper we propose a hybrid approximate method called Low-Rank
Variational Bayes correction (VBC), that uses the Laplace method and
subsequently a Variational Bayes correction in a lower dimension, to the joint
posterior mean. The cost is essentially that of the Laplace method which
ensures scalability of the method, in both model complexity and data size.
Models with fixed and unknown hyperparameters are considered, for simulated and
real examples, for small and large datasets.
( 2
min )
Several recent works have studied the convergence \textit{in high
probability} of stochastic gradient descent (SGD) and its clipped variant.
Compared to vanilla SGD, clipped SGD is practically more stable and has the
additional theoretical benefit of logarithmic dependence on the failure
probability. However, the convergence of other practical nonlinear variants of
SGD, e.g., sign SGD, quantized SGD and normalized SGD, that achieve improved
communication efficiency or accelerated convergence is much less understood. In
this work, we study the convergence bounds \textit{in high probability} of a
broad class of nonlinear SGD methods. For strongly convex loss functions with
Lipschitz continuous gradients, we prove a logarithmic dependence on the
failure probability, even when the noise is heavy-tailed. Strictly more general
than the results for clipped SGD, our results hold for any nonlinearity with
bounded (component-wise or joint) outputs, such as clipping, normalization, and
quantization. Further, existing results with heavy-tailed noise assume bounded
$\eta$-th central moments, with $\eta \in (1,2]$. In contrast, our refined
analysis works even for $\eta=1$, strictly relaxing the noise moment
assumptions in the literature.
( 2
min )
We extend PAC-Bayesian theory to generative models and develop generalization
bounds for models based on the Wasserstein distance and the total variation
distance. Our first result on the Wasserstein distance assumes the instance
space is bounded, while our second result takes advantage of dimensionality
reduction. Our results naturally apply to Wasserstein GANs and Energy-Based
GANs, and our bounds provide new training objectives for these two. Although
our work is mainly theoretical, we perform numerical experiments showing
non-vacuous generalization bounds for Wasserstein GANs on synthetic datasets.
( 2
min )
We derive and study time-uniform confidence spheres - termed confidence
sphere sequences (CSSs) - which contain the mean of random vectors with high
probability simultaneously across all sample sizes. Inspired by the original
work of Catoni and Giulini, we unify and extend their analysis to cover both
the sequential setting and to handle a variety of distributional assumptions.
More concretely, our results include an empirical-Bernstein CSS for bounded
random vectors (resulting in a novel empirical-Bernstein confidence interval),
a CSS for sub-$\psi$ random vectors, and a CSS for heavy-tailed random vectors
based on a sequentially valid Catoni-Giulini estimator. Finally, we provide a
version of our empirical-Bernstein CSS that is robust to contamination by Huber
noise.
( 2
min )
Randomized experiments are a powerful methodology for data-driven evaluation
of decisions or interventions. Yet, their validity may be undermined by network
interference. This occurs when the treatment of one unit impacts not only its
outcome but also that of connected units, biasing traditional treatment effect
estimations. Our study introduces a new framework to accommodate complex and
unknown network interference, moving beyond specialized models in the existing
literature. Our framework, which we term causal message-passing, is grounded in
a high-dimensional approximate message passing methodology and is specifically
tailored to experimental design settings with prevalent network interference.
Utilizing causal message-passing, we present a practical algorithm for
estimating the total treatment effect and demonstrate its efficacy in four
numerical scenarios, each with its unique interference structure.
( 2
min )
A mixture of multivariate Poisson-log normal factor analyzers is introduced
by imposing constraints on the covariance matrix, which resulted in flexible
models for clustering purposes. In particular, a class of eight parsimonious
mixture models based on the mixtures of factor analyzers model are introduced.
Variational Gaussian approximation is used for parameter estimation, and
information criteria are used for model selection. The proposed models are
explored in the context of clustering discrete data arising from RNA sequencing
studies. Using real and simulated data, the models are shown to give favourable
clustering performance. The GitHub R package for this work is available at
https://github.com/anjalisilva/mixMPLNFA and is released under the open-source
MIT license.
( 2
min )
We provide the first useful, rigorous analysis of ensemble sampling for the
stochastic linear bandit setting. In particular, we show that, under standard
assumptions, for a $d$-dimensional stochastic linear bandit with an interaction
horizon $T$, ensemble sampling with an ensemble of size $m$ on the order of $d
\log T$ incurs regret bounded by order $(d \log T)^{5/2} \sqrt{T}$. Ours is the
first result in any structured setting not to require the size of the ensemble
to scale linearly with $T$ -- which defeats the purpose of ensemble sampling --
while obtaining near $\sqrt{T}$ order regret. Ours is also the first result
that allows infinite action sets.
( 2
min )
Feedforward neural networks (FNNs) are typically viewed as pure prediction
algorithms, and their strong predictive performance has led to their use in
many machine-learning applications. However, their flexibility comes with an
interpretability trade-off; thus, FNNs have been historically less popular
among statisticians. Nevertheless, classical statistical theory, such as
significance testing and uncertainty quantification, is still relevant.
Supplementing FNNs with methods of statistical inference, and covariate-effect
visualisations, can shift the focus away from black-box prediction and make
FNNs more akin to traditional statistical models. This can allow for more
inferential analysis, and, hence, make FNNs more accessible within the
statistical-modelling context.
( 2
min )
Over the last decades, the family of $\alpha$-stale distributions has proven
to be useful for modelling in telecommunication systems. Particularly, in the
case of radar applications, finding a fast and accurate estimation for the
amplitude density function parameters appears to be very important. In this
work, the maximum likelihood estimator (MLE) is proposed for parameters of the
amplitude distribution. To do this, the amplitude data are \emph{projected} on
the horizontal and vertical axes using two simple transformations. It is proved
that the \emph{projected} data follow a zero-location symmetric $\alpha$-stale
distribution for which the MLE can be computed quite fast. The average of
computed MLEs based on two \emph{projections} is considered as estimator for
parameters of the amplitude distribution. Performance of the proposed
\emph{projection} method is demonstrated through simulation study and analysis
of two sets of real radar data.
( 2
min )
AutoML allows you to derive rapid, general insights from your data right at the beginning of a machine learning (ML) project lifecycle. Understanding up front which preprocessing techniques and algorithm types provide best results reduces the time to develop, train, and deploy the right model. It plays a crucial role in every model’s development process […]
( 14
min )
Llama 2 stands at the forefront of AI innovation, embodying an advanced auto-regressive language model developed on a sophisticated transformer foundation. It’s tailored to address a multitude of applications in both the commercial and research domains with English as the primary linguistic concentration. Its model parameters scale from an impressive 7 billion to a remarkable […]
( 18
min )
An established financial services firm with over 140 years in business, Principal is a global investment management leader and serves more than 62 million customers around the world. Principal is conducting enterprise-scale near-real-time analytics to deliver a seamless and hyper-personalized omnichannel customer experience on their mission to make financial security accessible for all. They are […]
( 10
min )
Prompt engineering has become an essential skill for anyone working with large language models (LLMs) to generate high-quality and relevant texts. Although text prompt engineering has been widely discussed, visual prompt engineering is an emerging field that requires attention. Visual prompts can include bounding boxes or masks that guide vision models in generating relevant and […]
( 13
min )
The telecommunications industry — the backbone of today’s interconnected world — is valued at a staggering $1.7 trillion globally, according to IDC. It’s a massive operation, as telcos process hundreds of petabytes of data in their networks each day. That magnitude is only increasing, as the total amount of data transacted globally is forecast to Read article >
( 6
min )
Automotive companies are transforming every phase of their product lifecycle — evolving their primarily physical, manual processes into software-driven, AI-enhanced digital systems. To help them save costs and reduce lead times, NVIDIA is announcing two new simulation engines on Omniverse Cloud: the virtual factory simulation engine and the autonomous vehicle (AV) simulation engine. Omniverse Cloud, Read article >
( 6
min )
As NVIDIA continues to collaborate with Microsoft to build state-of-the-art AI infrastructure, Microsoft is introducing additional H100-based virtual machines to Microsoft Azure to accelerate demanding AI workloads. At its Ignite conference in Seattle today, Microsoft announced its new NC H100 v5 VM series for Azure, the industry’s first cloud instances featuring NVIDIA H100 NVL GPUs. Read article >
( 5
min )
Today’s landscape of free, open-source large language models (LLMs) is like an all-you-can-eat buffet for enterprises. This abundance can be overwhelming for developers building custom generative AI applications, as they need to navigate unique project and business requirements, including compatibility, security and the data used to train the models. NVIDIA AI Foundation Models — a Read article >
( 5
min )
Artificial intelligence on Windows 11 PCs marks a pivotal moment in tech history, revolutionizing experiences for gamers, creators, streamers, office workers, students and even casual PC users. It offers unprecedented opportunities to enhance productivity for users of the more than 100 million Windows PCs and workstations that are powered by RTX GPUs. And NVIDIA RTX Read article >
( 7
min )
Computer vision enables contact-free 3D printing, letting engineers print with high-performance materials they couldn’t use before.
( 11
min )
The usage of Lithium-ion (Li-ion) batteries has gained widespread popularity
across various industries, from powering portable electronic devices to
propelling electric vehicles and supporting energy storage systems. A central
challenge in Li-ion battery reliability lies in accurately predicting their
Remaining Useful Life (RUL), which is a critical measure for proactive
maintenance and predictive analytics. This study presents a novel approach that
harnesses the power of multiple denoising modules, each trained to address
specific types of noise commonly encountered in battery data. Specifically, a
denoising auto-encoder and a wavelet denoiser are used to generate
encoded/decomposed representations, which are subsequently processed through
dedicated self-attention transformer encoders. After extensive experimentation
on NASA and CALCE data, a broad spectrum of health indicator values are
estimated under a set of diverse noise patterns. The reported error metrics on
these data are on par with or better than the state-of-the-art reported in
recent literature.
( 2
min )
Air quality forecasting has garnered significant attention recently, with
data-driven models taking center stage due to advancements in machine learning
and deep learning models. However, researchers face challenges with complex
data acquisition and the lack of open-sourced datasets, hindering efficient
model validation. This paper introduces PurpleAirSF, a comprehensive and easily
accessible dataset collected from the PurpleAir network. With its high temporal
resolution, various air quality measures, and diverse geographical coverage,
this dataset serves as a useful tool for researchers aiming to develop novel
forecasting models, study air pollution patterns, and investigate their impacts
on health and the environment. We present a detailed account of the data
collection and processing methods employed to build PurpleAirSF. Furthermore,
we conduct preliminary experiments using both classic and modern
spatio-temporal forecasting models, thereby establishing a benchmark for future
air quality forecasting tasks.
( 2
min )
Commonsense norms are defeasible by context: reading books is usually great,
but not when driving a car. While contexts can be explicitly described in
language, in embodied scenarios, contexts are often provided visually. This
type of visually grounded reasoning about defeasible commonsense norms is
generally easy for humans, but (as we show) poses a challenge for machines, as
it necessitates both visual understanding and reasoning about commonsense
norms. We construct a new multimodal benchmark for studying visual-grounded
commonsense norms: NORMLENS. NORMLENS consists of 10K human judgments
accompanied by free-form explanations covering 2K multimodal situations, and
serves as a probe to address two questions: (1) to what extent can models align
with average human judgment? and (2) how well can models explain their
predicted judgments? We find that state-of-the-art model judgments and
explanations are not well-aligned with human annotation. Additionally, we
present a new approach to better align models with humans by distilling social
commonsense knowledge from large language models. The data and code are
released at https://seungjuhan.me/normlens.
( 3
min )
This paper presents a novel fast machine learning method that leverages two
techniques: Vector Embedding on Orthonormal Basis (VEOB) and Spectral Transform
(ST). The VEOB converts the original data encoding into a vector embedding with
coordinates projected onto orthonormal bases. The Singular Value Decomposition
(SVD) technique is used to calculate the vector basis and projection
coordinates, leading to an enhanced distance measurement in the embedding space
and facilitating data compression by preserving the projection vectors
associated with the largest singular values. On the other hand, ST transforms
sequence of vector data into spectral space. By applying the Discrete Cosine
Transform (DCT) and selecting the most significant components, it streamlines
the handling of lengthy vector sequences. The paper provides examples of word
embedding, text chunk embedding, and image embedding, implemented in Julia
language with a vector database. It also investigates unsupervised learning and
supervised learning using this method, along with strategies for handling large
data volumes.
( 2
min )
We present and experimentally evaluate using transfer learning to address
experimental data scarcity when training neural network (NN) models for
Mach-Zehnder interferometer mesh-based optical matrix multipliers. Our approach
involves pre-training the model using synthetic data generated from a less
accurate analytical model and fine-tuning with experimental data. Our
investigation demonstrates that this method yields significant reductions in
modeling errors compared to using an analytical model, or a standalone NN model
when training data is limited. Utilizing regularization techniques and ensemble
averaging, we achieve < 1 dB root-mean-square error on the matrix weights
implemented by a 3x3 photonic chip while using only 25% of the available data.
( 2
min )
We introduce a value-based RL agent, which we call BBF, that achieves
super-human performance in the Atari 100K benchmark. BBF relies on scaling the
neural networks used for value estimation, as well as a number of other design
choices that enable this scaling in a sample-efficient manner. We conduct
extensive analyses of these design choices and provide insights for future
work. We end with a discussion about updating the goalposts for
sample-efficient RL research on the ALE. We make our code and data publicly
available at
https://github.com/google-research/google-research/tree/master/bigger_better_faster.
( 2
min )
For a specific class of sparse Gaussian graphical models, we provide a
closed-form solution for the determinant of the covariance matrix. In our
framework, the graphical interaction model (i.e., the covariance selection
model) is equal to replacement product of $\mathcal{K}_{n}$ and
$\mathcal{K}_{n-1}$, where $\mathcal{K}_n$ is the complete graph with $n$
vertices. Our analysis is based on taking the Fourier transform of the local
factors of the model, which can be viewed as an application of the Normal
Factor Graph Duality Theorem and holographic algorithms. The closed-form
expression is obtained by applying the Matrix Determinant Lemma on the
transformed graphical model. In this context, we will also define a notion of
equivalence between two Gaussian graphical models.
( 2
min )
Bitcoin as a cryptocurrency has been one of the most important digital coins
and the first decentralized digital currency. Deep neural networks, on the
other hand, has shown promising results recently; however, we require huge
amount of high-quality data to leverage their power. There are some techniques
such as augmentation that can help us with increasing the dataset size, but we
cannot exploit them on historical bitcoin data. As a result, we propose a
shallow Bidirectional-LSTM (Bi-LSTM) model, fed with feature engineered data
using our proposed method to forecast bitcoin closing prices in a daily time
frame. We compare the performance with that of other forecasting methods, and
show that with the help of the proposed feature engineering method, a shallow
deep neural network outperforms other popular price forecasting models.
( 2
min )
Driver stress is a major cause of car accidents and death worldwide.
Furthermore, persistent stress is a health problem, contributing to
hypertension and other diseases of the cardiovascular system. Stress has a
measurable impact on heart and breathing rates and stress levels can be
inferred from such measurements. Galvanic skin response is a common test to
measure the perspiration caused by both physiological and psychological stress,
as well as extreme emotions. In this paper, galvanic skin response is used to
estimate the ground truth stress levels. A feature selection technique based on
the minimal redundancy-maximal relevance method is then applied to multiple
heart rate variability and breathing rate metrics to identify a novel and
optimal combination for use in detecting stress. The support vector machine
algorithm with a radial basis function kernel was used along with these
features to reliably predict stress. The proposed method has achieved a high
level of accuracy on the target dataset.
( 2
min )
Diagonal linear networks (DLNs) are a toy simplification of artificial neural
networks; they consist in a quadratic reparametrization of linear regression
inducing a sparse implicit regularization. In this paper, we describe the
trajectory of the gradient flow of DLNs in the limit of small initialization.
We show that incremental learning is effectively performed in the limit:
coordinates are successively activated, while the iterate is the minimizer of
the loss constrained to have support on the active coordinates only. This shows
that the sparse implicit regularization of DLNs decreases with time. This work
is restricted to the underparametrized regime with anti-correlated features for
technical reasons.
( 2
min )
Path reasoning methods over knowledge graphs have gained popularity for their
potential to improve transparency in recommender systems. However, the
resulting models still rely on pre-trained knowledge graph embeddings, fail to
fully exploit the interdependence between entities and relations in the KG for
recommendation, and may generate inaccurate explanations. In this paper, we
introduce PEARLM, a novel approach that efficiently captures user behaviour and
product-side knowledge through language modelling. With our approach, knowledge
graph embeddings are directly learned from paths over the KG by the language
model, which also unifies entities and relations in the same optimisation
space. Constraints on the sequence decoding additionally guarantee path
faithfulness with respect to the KG. Experiments on two datasets show the
effectiveness of our approach compared to state-of-the-art baselines. Source
code and datasets: AVAILABLE AFTER GETTING ACCEPTED.
( 2
min )
To facilitate reliable deployments of autonomous robots in the real world,
Out-of-Distribution (OOD) detection capabilities are often required. A powerful
approach for OOD detection is based on density estimation with Normalizing
Flows (NFs). However, we find that prior work with NFs attempts to match the
complex target distribution topologically with naive base distributions leading
to adverse implications. In this work, we circumvent this topological mismatch
using an expressive class-conditional base distribution trained with an
information-theoretic objective to match the required topology. The proposed
method enjoys the merits of wide compatibility with existing learned models
without any performance degradation and minimum computation overhead while
enhancing OOD detection capabilities. We demonstrate superior results in
density estimation and 2D object detection benchmarks in comparison with
extensive baselines. Moreover, we showcase the applicability of the method with
a real-robot deployment.
( 2
min )
The Recommender system is a vital information service on today's Internet.
Recently, graph neural networks have emerged as the leading approach for
recommender systems. We try to review recent literature on graph neural
network-based recommender systems, covering the background and development of
both recommender systems and graph neural networks. Then categorizing
recommender systems by their settings and graph neural networks by spectral and
spatial models, we explore the motivation behind incorporating graph neural
networks into recommender systems. We also analyze challenges and open problems
in graph construction, embedding propagation and aggregation, and computation
efficiency. This guides us to better explore the future directions and
developments in this domain.
( 2
min )
Computer-assisted methods have emerged as valuable tools for retrosynthesis
analysis. However, quantifying the plausibility of generated retrosynthesis
routes remains a challenging task. We introduce Retro-BLEU, a statistical
metric adapted from the well-established BLEU score in machine translation, to
evaluate the plausibility of retrosynthesis routes based on reaction template
sequences analysis. We demonstrate the effectiveness of Retro-BLEU by applying
it to a diverse set of retrosynthesis routes generated by state-of-the-art
algorithms and compare the performance with other evaluation metrics. The
results show that Retro-BLEU is capable of differentiating between plausible
and implausible routes. Furthermore, we provide insights into the strengths and
weaknesses of Retro-BLEU, paving the way for future developments and
improvements in this field.
( 2
min )
We show how to compute the elements of a sequence $x_t = a_t x_{t-1} + b_t$
in parallel, given $t = (1, 2, \dots, n)$, $a_t \in \mathbb{R}^n$, $b_t \in
\mathbb{R}^n$, and initial value $x_0 \in \mathbb{R}$. On $n$ parallel
processors, the computation of $n$ elements incurs $\mathcal{O}(\log n)$ time
and $\mathcal{O}(n)$ space. Sequences of this form are ubiquitous in science
and engineering, making their parallelization useful for a vast number of
applications. We implement parallelization in software, test it on parallel
hardware, and verify that it executes faster than sequential computation by a
factor of $\frac{n}{\log n}$.
( 2
min )
ODTLearn is an open-source Python package that provides methods for learning
optimal decision trees for high-stakes predictive and prescriptive tasks based
on the mixed-integer optimization (MIO) framework proposed in Aghaei et al.
(2019) and several of its extensions. The current version of the package
provides implementations for learning optimal classification trees, optimal
fair classification trees, optimal classification trees robust to distribution
shifts, and optimal prescriptive trees from observational data. We have
designed the package to be easy to maintain and extend as new optimal decision
tree problem classes, reformulation strategies, and solution algorithms are
introduced. To this end, the package follows object-oriented design principles
and supports both commercial (Gurobi) and open source (COIN-OR branch and cut)
solvers. The package documentation and an extensive user guide can be found at
https://d3m-research-group.github.io/odtlearn/. Additionally, users can view
the package source code and submit feature requests and bug reports by visiting
https://github.com/D3M-Research-Group/odtlearn.
( 2
min )
We investigate the long-run behavior of single-server queues with Hawkes
arrivals and general service distributions and related optimization problems.
In detail, utilizing novel coupling techniques, we establish finite moment
bounds for the stationary distribution of the workload and busy period
processes. In addition, we are able to show that, those queueing processes
converge exponentially fast to their stationary distribution. Based on these
theoretic results, we develop an efficient numerical algorithm to solve the
optimal staffing problem for the Hawkes queues in a data-driven manner.
Numerical results indicate a sharp difference in staffing for Hawkes queues,
compared to the classic GI/GI/1 model, especially in the heavy-traffic regime.
( 2
min )
Gaussian Mixture Models (GMMs) are one of the most potent parametric density
models used extensively in many applications. Flexibly-tied factorization of
the covariance matrices in GMMs is a powerful approach for coping with the
challenges of common GMMs when faced with high-dimensional data and complex
densities which often demand a large number of Gaussian components. However,
the expectation-maximization algorithm for fitting flexibly-tied GMMs still
encounters difficulties with streaming and very large dimensional data. To
overcome these challenges, this paper suggests the use of first-order
stochastic optimization algorithms. Specifically, we propose a new stochastic
optimization algorithm on the manifold of orthogonal matrices. Through numerous
empirical results on both synthetic and real datasets, we observe that
stochastic optimization methods can outperform the expectation-maximization
algorithm in terms of attaining better likelihood, needing fewer epochs for
convergence, and consuming less time per each epoch.
( 2
min )
Recent works have shown that physics-inspired architectures allow the
training of deep graph neural networks (GNNs) without oversmoothing. The role
of these physics is unclear, however, with successful examples of both
reversible (e.g., Hamiltonian) and irreversible (e.g., diffusion) phenomena
producing comparable results despite diametrically opposed mechanisms, and
further complications arising due to empirical departures from mathematical
theory. This work presents a series of novel GNN architectures based upon
structure-preserving bracket-based dynamical systems, which are provably
guaranteed to either conserve energy or generate positive dissipation with
increasing depth. It is shown that the theoretically principled framework
employed here allows for inherently explainable constructions, which
contextualize departures from theory in current architectures and better
elucidate the roles of reversibility and irreversibility in network
performance.
( 2
min )
In recent years, language-driven artistic style transfer has emerged as a new
type of style transfer technique, eliminating the need for a reference style
image by using natural language descriptions of the style. The first model to
achieve this, called CLIPstyler, has demonstrated impressive stylisation
results. However, its lengthy optimisation procedure at runtime for each query
limits its suitability for many practical applications. In this work, we
present FastCLIPstyler, a generalised text-based image style transfer model
capable of stylising images in a single forward pass for arbitrary text inputs.
Furthermore, we introduce EdgeCLIPstyler, a lightweight model designed for
compatibility with resource-constrained devices. Through quantitative and
qualitative comparisons with state-of-the-art approaches, we demonstrate that
our models achieve superior stylisation quality based on measurable metrics
while offering significantly improved runtime efficiency, particularly on edge
devices.
( 2
min )
Associative memory architectures are designed for memorization but also
offer, through their retrieval method, a form of generalization to unseen
inputs: stored memories can be seen as prototypes from this point of view.
Focusing on Modern Hopfield Networks (MHN), we show that a large memorization
capacity undermines the generalization opportunity. We offer a solution to
better optimize this tradeoff. It relies on Minimum Description Length (MDL) to
determine during training which memories to store, as well as how many of them.
( 2
min )
We identify hidden layers inside a deep neural network (DNN) with group
actions on the data domain, and formulate a formal deep network as a dual voice
transform with respect to the Koopman operator, a linear representation of the
group action. Based on the group theoretic arguments, particularly by using
Schur's lemma, we show a simple proof of the universality of DNNs.
( 2
min )
Compared to "black-box" models, like random forests and deep neural networks,
explainable boosting machines (EBMs) are considered "glass-box" models that can
be competitively accurate while also maintaining a higher degree of
transparency and explainability. However, EBMs become readily less transparent
and harder to interpret in high-dimensional settings with many predictor
variables; they also become more difficult to use in production due to
increases in scoring time. We propose a simple solution based on the least
absolute shrinkage and selection operator (LASSO) that can help introduce
sparsity by reweighting the individual model terms and removing the less
relevant ones, thereby allowing these models to maintain their transparency and
relatively fast scoring times in higher-dimensional settings. In short,
post-processing a fitted EBM with many (i.e., possibly hundreds or thousands)
of terms using the LASSO can help reduce the model's complexity and drastically
improve scoring time. We illustrate the basic idea using two real-world
examples with code.
( 2
min )
We analyze geometric aspects of the gradient descent algorithm in Deep
Learning (DL) networks. In particular, we prove that the globally minimizing
weights and biases for the $\mathcal{L}^2$ cost obtained constructively in
[Chen-Munoz Ewald 2023] for underparametrized ReLU DL networks can generically
not be approximated via the gradient descent flow. We therefore conclude that
the method introduced in [Chen-Munoz Ewald 2023] is disjoint from the gradient
descent method.
( 2
min )
We derive the first large deviation rate function for the stochastic iterates
generated by policy gradient methods with a softmax parametrization and an
entropy regularized objective. Leveraging the contraction principle from large
deviations theory, we also develop a general recipe for deriving exponential
convergence rates for a wide spectrum of other policy parametrizations. This
approach unifies several results from the literature and simplifies existing
proof techniques.
( 2
min )
For a specific class of sparse Gaussian graphical models, we provide a
closed-form solution for the determinant of the covariance matrix. In our
framework, the graphical interaction model (i.e., the covariance selection
model) is equal to replacement product of $\mathcal{K}_{n}$ and
$\mathcal{K}_{n-1}$, where $\mathcal{K}_n$ is the complete graph with $n$
vertices. Our analysis is based on taking the Fourier transform of the local
factors of the model, which can be viewed as an application of the Normal
Factor Graph Duality Theorem and holographic algorithms. The closed-form
expression is obtained by applying the Matrix Determinant Lemma on the
transformed graphical model. In this context, we will also define a notion of
equivalence between two Gaussian graphical models.
( 2
min )
Learning nonparametric systems of Ordinary Differential Equations (ODEs) dot
x = f(t,x) from noisy data is an emerging machine learning topic. We use the
well-developed theory of Reproducing Kernel Hilbert Spaces (RKHS) to define
candidates for f for which the solution of the ODE exists and is unique.
Learning f consists of solving a constrained optimization problem in an RKHS.
We propose a penalty method that iteratively uses the Representer theorem and
Euler approximations to provide a numerical solution. We prove a generalization
bound for the L2 distance between x and its estimator and provide experimental
comparisons with the state-of-the-art.
( 2
min )
For the differential privacy under the sub-Gamma noise, we derive the
asymptotic properties of a class of network models with binary values with a
general link function. In this paper, we release the degree sequences of the
binary networks under a general noisy mechanism with the discrete Laplace
mechanism as a special case. We establish the asymptotic result including both
consistency and asymptotically normality of the parameter estimator when the
number of parameters goes to infinity in a class of network models. Simulations
and a real data example are provided to illustrate asymptotic results.
( 2
min )
ODTLearn is an open-source Python package that provides methods for learning
optimal decision trees for high-stakes predictive and prescriptive tasks based
on the mixed-integer optimization (MIO) framework proposed in Aghaei et al.
(2019) and several of its extensions. The current version of the package
provides implementations for learning optimal classification trees, optimal
fair classification trees, optimal classification trees robust to distribution
shifts, and optimal prescriptive trees from observational data. We have
designed the package to be easy to maintain and extend as new optimal decision
tree problem classes, reformulation strategies, and solution algorithms are
introduced. To this end, the package follows object-oriented design principles
and supports both commercial (Gurobi) and open source (COIN-OR branch and cut)
solvers. The package documentation and an extensive user guide can be found at
https://d3m-research-group.github.io/odtlearn/. Additionally, users can view
the package source code and submit feature requests and bug reports by visiting
https://github.com/D3M-Research-Group/odtlearn.
( 2
min )
Most prognostic methods require a decent amount of data for model training.
In reality, however, the amount of historical data owned by a single
organization might be small or not large enough to train a reliable prognostic
model. To address this challenge, this article proposes a federated prognostic
model that allows multiple users to jointly construct a failure time prediction
model using their multi-stream, high-dimensional, and incomplete data while
keeping each user's data local and confidential. The prognostic model first
employs multivariate functional principal component analysis to fuse the
multi-stream degradation signals. Then, the fused features coupled with the
times-to-failure are utilized to build a (log)-location-scale regression model
for failure prediction. To estimate parameters using distributed datasets and
keep the data privacy of all participants, we propose a new federated algorithm
for feature extraction. Numerical studies indicate that the performance of the
proposed model is the same as that of classic non-federated prognostic models
and is better than that of the models constructed by each user itself.
( 2
min )
This paper introduces $\textit{arfpy}$, a python implementation of
Adversarial Random Forests (ARF) (Watson et al., 2023), which is a lightweight
procedure for synthesizing new data that resembles some given data. The
software $\textit{arfpy}$ equips practitioners with straightforward
functionalities for both density estimation and generative modeling. The method
is particularly useful for tabular data and its competitive performance is
demonstrated in previous literature. As a major advantage over the mostly deep
learning based alternatives, $\textit{arfpy}$ combines the method's reduced
requirements in tuning efforts and computational resources with a user-friendly
python interface. This supplies audiences across scientific fields with
software to generate data effortlessly.
( 2
min )
We derive the existence of a new type of neural network, called a compact
matrix quantum group equivariant neural network, that learns from data that has
an underlying quantum symmetry. We apply the Woronowicz formulation of
Tannaka-Krein duality to characterise the weight matrices that appear in these
neural networks for any easy compact matrix quantum group. We show that compact
matrix quantum group equivariant neural networks contain, as a subclass, all
compact matrix group equivariant neural networks. Moreover, we obtain
characterisations of the weight matrices for many compact matrix group
equivariant neural networks that have not previously appeared in the machine
learning literature.
( 2
min )
Online communities are driving user engagement across industries like gaming, social media, ecommerce, dating, and e-learning. Members of these online communities trust platform owners to provide a safe and inclusive environment where they can freely consume content and contribute. Content moderators are often employed to review user-generated content and check that it’s safe and compliant […]
( 7
min )
Today, we are excited to announce the capability to fine-tune the Mistral 7B model using Amazon SageMaker JumpStart. You can now fine-tune and deploy Mistral text generation models on SageMaker JumpStart using the Amazon SageMaker Studio UI with a few clicks or using the SageMaker Python SDK. Foundation models perform very well with generative tasks, […]
( 10
min )
Fake news, defined as news that conveys or incorporates false, fabricated, or deliberately misleading information, has been around as early as the emergence of the printing press. The rapid spread of fake news and disinformation online is not only deceiving to the public, but can also have a profound impact on society, politics, economy, and […]
( 17
min )
In the era of big data and AI, companies are continually seeking ways to use these technologies to gain a competitive edge. One of the hottest areas in AI right now is generative AI, and for good reason. Generative AI offers powerful solutions that push the boundaries of what’s possible in terms of creativity and […]
( 13
min )
Data analysts must have a strong grasp of practical data visualization skills to paint a clear picture of complex data for a broader audience. Seeing the big picture by delivering coherent and easily comprehensible content is crucial. Companies highly value avant-garde data analysts who can not only dig into the data but also connect the… Read More »Beyond the numbers: The soft skills that elevate data analysts to the next level
The post Beyond the numbers: The soft skills that elevate data analysts to the next level appeared first on Data Science Central.
( 21
min )
Given how quickly the digital marketing industry changes, keeping up with the most recent trends and technologies can be challenging. Traditional marketing techniques are no longer enough to connect with and engage your target audience. Machine learning in marketing has got your back. Check out the top 10 use cases and implementation pointers that offer… Read More »Machine learning in marketing: 10 use cases and implementation tips
The post Machine learning in marketing: 10 use cases and implementation tips appeared first on Data Science Central.
( 24
min )
In the last two years, I published 5 machine learning and AI books, including one on synthetic data by Elsevier. This represents over 800 pages of compact, state-of-the-art material. The new addition features my most recent advances: the problems that I encountered with generative adversarial networks, and how I overcome them with new techniques. The… Read More »New Book: Statistical Optimization for GenAI and Machine Learning
The post New Book: Statistical Optimization for GenAI and Machine Learning appeared first on Data Science Central.
( 21
min )
Character animator Sir Wade Neistadt works to make animation and 3D education more accessible for aspiring and professional artists alike through video tutorials and industry training.
( 8
min )
Sparse matrix representations are ubiquitous in computational science and
machine learning, leading to significant reductions in compute time, in
comparison to dense representation, for problems that have local connectivity.
The adoption of sparse representation in leading ML frameworks such as PyTorch
is incomplete, however, with support for both automatic differentiation and GPU
acceleration missing. In this work, we present an implementation of a CSR-based
sparse matrix wrapper for PyTorch with CUDA acceleration for basic matrix
operations, as well as automatic differentiability. We also present several
applications of the resulting sparse kernels to optimization problems,
demonstrating ease of implementation and performance measurements versus their
dense counterparts.
( 2
min )
Deep learning-based vision is characterized by intricate frameworks that
often necessitate a profound understanding, presenting a barrier to newcomers
and limiting broad adoption. With many researchers grappling with the
constraints of smaller datasets, there's a pronounced reliance on pre-trained
neural networks, especially for tasks such as image classification. This
reliance is further intensified in niche imaging areas where obtaining vast
datasets is challenging. Despite the widespread use of transfer learning as a
remedy to the small dataset dilemma, a conspicuous absence of tailored auto-ML
solutions persists. Addressing these challenges is "Deep Fast Vision", a python
library that streamlines the deep learning process. This tool offers a
user-friendly experience, enabling results through a simple nested dictionary
definition, helping to democratize deep learning for non-experts. Designed for
simplicity and scalability, Deep Fast Vision appears as a bridge, connecting
the complexities of existing deep learning frameworks with the needs of a
diverse user base.
( 2
min )
We present the new Orthogonal Polynomials Approximation Algorithm (OPAA), a
parallelizable algorithm that solves two problems from a functional analytic
approach: first, it finds a smooth functional estimate of a density function,
whether it is normalized or not; second, the algorithm provides an estimate of
the normalizing weight. In the context of Bayesian inference, OPAA provides an
estimate of the posterior function as well as the normalizing weight, which is
also known as the evidence.
A core component of OPAA is a special transform of the square root of the
joint distribution into a special functional space of our construct. Through
this transform, the evidence is equated with the $L^2$ norm of the transformed
function, squared. Hence, the evidence can be estimated by the sum of squares
of the transform coefficients. The computations can be parallelized and
completed in one pass.
To compute the transform coefficients, OPAA proposes a new computational
scheme leveraging Gauss--Hermite quadrature in higher dimensions. Not only does
it avoid the potential high variance problem associated with random sampling
methods, it also enables one to speed up the computation by parallelization,
and significantly reduces the complexity by a vector decomposition.
( 2
min )
Recent advances have shown that GP priors, or their finite realisations, can
be encoded using deep generative models such as variational autoencoders
(VAEs). These learned generators can serve as drop-in replacements for the
original priors during MCMC inference. While this approach enables efficient
inference, it loses information about the hyperparameters of the original
models, and consequently makes inference over hyperparameters impossible and
the learned priors indistinct. To overcome this limitation, we condition the
VAE on stochastic process hyperparameters. This allows the joint encoding of
hyperparameters with GP realizations and their subsequent estimation during
inference. Further, we demonstrate that our proposed method, PriorCVAE, is
agnostic to the nature of the models which it approximates, and can be used,
for instance, to encode solutions of ODEs. It provides a practical tool for
approximate inference and shows potential in real-life spatial and
spatiotemporal applications.
( 2
min )
The simulation of power system dynamics poses a computationally expensive
task. Considering the growing uncertainty of generation and demand patterns,
thousands of scenarios need to be continuously assessed to ensure the safety of
power systems. Physics-Informed Neural Networks (PINNs) have recently emerged
as a promising solution for drastically accelerating computations of non-linear
dynamical systems. This work investigates the applicability of these methods
for power system dynamics, focusing on the dynamic response to load
disturbances. Comparing the prediction of PINNs to the solution of conventional
solvers, we find that PINNs can be 10 to 1000 times faster than conventional
solvers. At the same time, we find them to be sufficiently accurate and
numerically stable even for large time steps. To facilitate a deeper
understanding, this paper also present a new regularisation of Neural Network
(NN) training by introducing a gradient-based term in the loss function. The
resulting NNs, which we call dtNNs, help us deliver a comprehensive analysis
about the strengths and weaknesses of the NN based approaches, how
incorporating knowledge of the underlying physics affects NN performance, and
how this compares with conventional solvers for power system dynamics.
( 2
min )
The relentless pursuit of miniaturization and performance enhancement in
electronic devices has led to a fundamental challenge in the field of circuit
design and simulation: how to accurately account for the inherent stochastic
nature of certain devices. While conventional deterministic models have served
as indispensable tools for circuit designers, they fall short when it comes to
capture the subtle yet critical variability exhibited by many electronic
components. In this paper, we present an innovative approach that transcends
the limitations of traditional modeling techniques by harnessing the power of
machine learning, specifically Mixture Density Networks (MDNs), to faithfully
represent and simulate the stochastic behavior of electronic devices. We
demonstrate our approach to model heater cryotrons, where the model is able to
capture the stochastic switching dynamics observed in the experiment. Our model
shows 0.82% mean absolute error for switching probability. This paper marks a
significant step forward in the quest for accurate and versatile compact
models, poised to drive innovation in the realm of electronic circuits.
( 2
min )
We investigate how shallow ReLU networks interpolate between known regions.
Our analysis shows that empirical risk minimizers converge to a minimum norm
interpolant as the number of data points and parameters tends to infinity when
a weight decay regularizer is penalized with a coefficient which vanishes at a
precise rate as the network width and the number of data points grow. With and
without explicit regularization, we numerically study the implicit bias of
common optimization algorithms towards known minimum norm interpolants.
( 2
min )
A classic inferential statistical problem is the goodness-of-fit (GOF) test.
Such a test can be challenging when the hypothesized parametric model has an
intractable likelihood and its distributional form is not available. Bayesian
methods for GOF can be appealing due to their ability to incorporate expert
knowledge through prior distributions.
However, standard Bayesian methods for this test often require strong
distributional assumptions on the data and their relevant parameters. To
address this issue, we propose a semi-Bayesian nonparametric (semi-BNP)
procedure in the context of the maximum mean discrepancy (MMD) measure that can
be applied to the GOF test. Our method introduces a novel Bayesian estimator
for the MMD, enabling the development of a measure-based hypothesis test for
intractable models. Through extensive experiments, we demonstrate that our
proposed test outperforms frequentist MMD-based methods by achieving a lower
false rejection and acceptance rate of the null hypothesis. Furthermore, we
showcase the versatility of our approach by embedding the proposed estimator
within a generative adversarial network (GAN) framework. It facilitates a
robust BNP learning approach as another significant application of our method.
With our BNP procedure, this new GAN approach can enhance sample diversity and
improve inferential accuracy compared to traditional techniques.
( 3
min )
Real-time density estimation is ubiquitous in many applications, including
computer vision and signal processing. Kernel density estimation is arguably
one of the most commonly used density estimation techniques, and the use of
"sliding window" mechanism adapts kernel density estimators to dynamic
processes. In this paper, we derive the asymptotic mean integrated squared
error (AMISE) upper bound for the "sliding window" kernel density estimator.
This upper bound provides a principled guide to devise a novel estimator, which
we name the temporal adaptive kernel density estimator (TAKDE). Compared to
heuristic approaches for "sliding window" kernel density estimator, TAKDE is
theoretically optimal in terms of the worst-case AMISE. We provide numerical
experiments using synthetic and real-world datasets, showing that TAKDE
outperforms other state-of-the-art dynamic density estimators (including those
outside of kernel family). In particular, TAKDE achieves a superior test
log-likelihood with a smaller runtime.
( 2
min )
Forecasting healthcare time series is crucial for early detection of adverse
outcomes and for patient monitoring. Forecasting, however, can be difficult in
practice due to noisy and intermittent data. The challenges are often
exacerbated by change points induced via extrinsic factors, such as the
administration of medication. To address these challenges, we propose a novel
hybrid global-local architecture and a pharmacokinetic encoder that informs
deep learning models of patient-specific treatment effects. We showcase the
efficacy of our approach in achieving significant accuracy gains for a blood
glucose forecasting task using both realistically simulated and real-world
data. Our global-local architecture improves over patient-specific models by
9.2-14.6%. Additionally, our pharmacokinetic encoder improves over alternative
encoding techniques by 4.4% on simulated data and 2.1% on real-world data. The
proposed approach can have multiple beneficial applications in clinical
practice, such as issuing early warnings about unexpected treatment responses,
or helping to characterize patient-specific treatment effects in terms of drug
absorption and elimination characteristics.
( 2
min )
Progressing towards a new era of Artificial Intelligence (AI) - enabled
wireless networks, concerns regarding the environmental impact of AI have been
raised both in industry and academia. Federated Learning (FL) has emerged as a
key privacy preserving decentralized AI technique. Despite efforts currently
being made in FL, its environmental impact is still an open problem. Targeting
the minimization of the overall energy consumption of an FL process, we propose
the orchestration of computational and communication resources of the involved
devices to minimize the total energy required, while guaranteeing a certain
performance of the model. To this end, we propose a Soft Actor Critic Deep
Reinforcement Learning (DRL) solution, where a penalty function is introduced
during training, penalizing the strategies that violate the constraints of the
environment, and contributing towards a safe RL process. A device level
synchronization method, along with a computationally cost effective FL
environment are proposed, with the goal of further reducing the energy
consumption and communication overhead. Evaluation results show the
effectiveness and robustness of the proposed scheme compared to four
state-of-the-art baseline solutions on different network environments and FL
architectures, achieving a decrease of up to 94% in the total energy
consumption.
( 3
min )
Early-exit neural networks (EENNs) facilitate adaptive inference by producing
predictions at multiple stages of the forward pass. In safety-critical
applications, these predictions are only meaningful when complemented with
reliable uncertainty estimates. Yet, due to their sequential structure, an
EENN's uncertainty estimates should also be consistent: labels that are deemed
improbable at one exit should not reappear within the confidence interval / set
of later exits. We show that standard uncertainty quantification techniques,
like Bayesian methods or conformal prediction, can lead to inconsistency across
exits. We address this problem by applying anytime-valid confidence sequences
(AVCSs) to the exits of EENNs. By design, AVCSs maintain consistency across
exits. We examine the theoretical and practical challenges of applying AVCSs to
EENNs and empirically validate our approach on both regression and
classification tasks.
( 2
min )
How do language models deal with the limited bandwidth of the residual
stream? Prior work has suggested that some attention heads and MLP layers may
perform a "memory management" role. That is, clearing residual stream
directions set by earlier layers by reading in information and writing out the
negative version. In this work, we present concrete evidence for this
phenomenon in a 4-layer transformer. We identify several heads in layer 2 that
consistently remove the output of a single layer 0 head. We then verify that
this erasure causally depends on the original written direction. We further
demonstrate that direct logit attribution (DLA) suggests that writing and
erasing heads directly contribute to predictions, when in fact their effects
cancel out. Then we present adversarial prompts for which this effect is
particularly salient. These findings reveal that memory management can make DLA
results misleading. Accordingly, we make concrete recommendations for circuit
analysis to prevent interpretability illusions.
( 2
min )
We present an oracle-efficient relaxation for the adversarial contextual
bandits problem, where the contexts are sequentially drawn i.i.d from a known
distribution and the cost sequence is chosen by an online adversary. Our
algorithm has a regret bound of
$O(T^{\frac{2}{3}}(K\log(|\Pi|))^{\frac{1}{3}})$ and makes at most $O(K)$ calls
per round to an offline optimization oracle, where $K$ denotes the number of
actions, $T$ denotes the number of rounds and $\Pi$ denotes the set of
policies. This is the first result to improve the prior best bound of
$O((TK)^{\frac{2}{3}}(\log(|\Pi|))^{\frac{1}{3}})$ as obtained by Syrgkanis et
al. at NeurIPS 2016, and the first to match the original bound of Langford and
Zhang at NeurIPS 2007 which was obtained for the stochastic case.
( 2
min )
Bayesian bandit algorithms with approximate Bayesian inference have been
widely used in real-world applications. However, there is a large discrepancy
between the superior practical performance of these approaches and their
theoretical justification. Previous research only indicates a negative
theoretical result: Thompson sampling could have a worst-case linear regret
$\Omega(T)$ with a constant threshold on the inference error measured by one
$\alpha$-divergence. To bridge this gap, we propose an Enhanced Bayesian Upper
Confidence Bound (EBUCB) framework that can efficiently accommodate bandit
problems in the presence of approximate inference. Our theoretical analysis
demonstrates that for Bernoulli multi-armed bandits, EBUCB can achieve the
optimal regret order $O(\log T)$ if the inference error measured by two
different $\alpha$-divergences is less than a constant, regardless of how large
this constant is. To our best knowledge, our study provides the first
theoretical regret bound that is better than $o(T)$ in the setting of constant
approximate inference error. Furthermore, in concordance with the negative
results in previous studies, we show that only one bounded $\alpha$-divergence
is insufficient to guarantee a sub-linear regret.
( 3
min )
We study the problem of learning decentralized linear quadratic regulator
when the system model is unknown a priori. We propose an online learning
algorithm that adaptively designs a control policy as new data samples from a
single system trajectory become available. Our algorithm design uses a
disturbance-feedback representation of state-feedback controllers coupled with
online convex optimization with memory and delayed feedback. We show that our
controller enjoys an expected regret that scales as $\sqrt{T}$ with the time
horizon $T$ for the case of partially nested information pattern. For more
general information patterns, the optimal controller is unknown even if the
system model is known. In this case, the regret of our controller is shown with
respect to a linear sub-optimal controller. We validate our theoretical
findings using numerical experiments.
( 2
min )
This paper introduces a new approach to address the issue of class imbalance
in graph neural networks (GNNs) for learning on graph-structured data. Our
approach integrates imbalanced node classification and Bias-Variance
Decomposition, establishing a theoretical framework that closely relates data
imbalance to model variance. We also leverage graph augmentation technique to
estimate the variance, and design a regularization term to alleviate the impact
of imbalance. Exhaustive tests are conducted on multiple benchmarks, including
naturally imbalanced datasets and public-split class-imbalanced datasets,
demonstrating that our approach outperforms state-of-the-art methods in various
imbalanced scenarios. This work provides a novel theoretical perspective for
addressing the problem of imbalanced node classification in GNNs.
( 2
min )
The causalimages R package enables causal inference with image and image
sequence data, providing new tools for integrating novel data sources like
satellite and bio-medical imagery into the study of cause and effect. One set
of functions enables image-based causal inference analyses. For example, one
key function decomposes treatment effect heterogeneity by images using an
interpretable Bayesian framework. This allows for determining which types of
images or image sequences are most responsive to interventions. A second
modeling function allows researchers to control for confounding using images.
The package also allows investigators to produce embeddings that serve as
vector summaries of the image or video content. Finally, infrastructural
functions are also provided, such as tools for writing large-scale image and
image sequence data as sequentialized byte strings for more rapid image
analysis. causalimages therefore opens new capabilities for causal inference in
R, letting researchers use informative imagery in substantive analyses in a
fast and accessible manner.
( 2
min )
With the exponential growth in large language models (LLMs), leveraging their
emergent properties for specialized domains like finance merits exploration.
However, regulated fields such as finance pose unique constraints, requiring
domain-optimized frameworks. We present ConFIRM, an LLM-based conversational
financial information retrieval model tailored for query intent classification
and knowledge base labeling.
ConFIRM comprises two modules:
1) a method to synthesize finance domain-specific question-answer pairs, and
2) evaluation of parameter efficient fine-tuning approaches for the query
classification task. We generate a dataset of over 4000 samples, assessing
accuracy on a separate test set.
ConFIRM achieved over 90% accuracy, essential for regulatory compliance.
ConFIRM provides a data-efficient solution to extract precise query intent for
financial dialog systems.
( 2
min )
Since their inception, Variational Autoencoders (VAEs) have become central in
machine learning. Despite their widespread use, numerous questions regarding
their theoretical properties remain open. Using PAC-Bayesian theory, this work
develops statistical guarantees for VAEs. First, we derive the first
PAC-Bayesian bound for posterior distributions conditioned on individual
samples from the data-generating distribution. Then, we utilize this result to
develop generalization guarantees for the VAE's reconstruction loss, as well as
upper bounds on the distance between the input and the regenerated
distributions. More importantly, we provide upper bounds on the Wasserstein
distance between the input distribution and the distribution defined by the
VAE's generative model.
( 2
min )
Shapley values are among the most popular tools for explaining predictions of
blackbox machine learning models. However, their high computational cost
motivates the use of sampling approximations, inducing a considerable degree of
uncertainty. To stabilize these model explanations, we propose ControlSHAP, an
approach based on the Monte Carlo technique of control variates. Our
methodology is applicable to any machine learning model and requires virtually
no extra computation or modeling effort. On several high-dimensional datasets,
we find it can produce dramatic reductions in the Monte Carlo variability of
Shapley estimates.
( 2
min )
Accurately predicting the elastic properties of crystalline solids is vital
for computational materials science. However, traditional atomistic scale ab
initio approaches are computationally intensive, especially for studying
complex materials with a large number of atoms in a unit cell. We introduce a
novel data-driven approach to efficiently predict the elastic properties of
crystal structures using SE(3)-equivariant graph neural networks (GNNs). This
approach yields important scalar elastic moduli with the accuracy comparable to
recent data-driven studies. Importantly, our symmetry-aware GNNs model also
enables the prediction of the strain energy density (SED) and the associated
elastic constants, the fundamental tensorial quantities that are significantly
influenced by a material's crystallographic group. The model consistently
distinguishes independent elements of SED tensors, in accordance with the
symmetry of the crystal structures. Finally, our deep learning model possesses
meaningful latent features, offering an interpretable prediction of the elastic
properties.
( 2
min )
Multiscale is a hallmark feature of complex nonlinear systems. While the
simulation using the classical numerical methods is restricted by the local
\textit{Taylor} series constraints, the multiscale techniques are often limited
by finding heuristic closures. This study proposes a new method for simulating
multiscale problems using deep neural networks. By leveraging the hierarchical
learning of neural network time steppers, the method adapts time steps to
approximate dynamical system flow maps across timescales. This approach
achieves state-of-the-art performance in less computational time compared to
fixed-step neural network solvers. The proposed method is demonstrated on
several nonlinear dynamical systems, and source codes are provided for
implementation. This method has the potential to benefit multiscale analysis of
complex systems and encourage further investigation in this area.
( 2
min )
Knee-Joint Osteoarthritis (KOA) is a prevalent cause of global disability and
is inherently complex to diagnose due to its subtle radiographic markers and
individualized progression. One promising classification avenue involves
applying deep learning methods; however, these techniques demand extensive,
diversified datasets, which pose substantial challenges due to medical data
collection restrictions. Existing practices typically resort to smaller
datasets and transfer learning. However, this approach often inherits
unnecessary pre-learned features that can clutter the classifier's vector
space, potentially hampering performance. This study proposes a novel paradigm
for improving post-training specialized classifiers by introducing adaptive
variance thresholding (AVT) followed by Neural Architecture Search (NAS). This
approach led to two key outcomes: an increase in the initial accuracy of the
pre-trained KOA models and a 60-fold reduction in the NAS input vector space,
thus facilitating faster inference speed and a more efficient hyperparameter
search. We also applied this approach to an external model trained for KOA
classification. Despite its initial performance, the application of our
methodology improved its average accuracy, making it one of the top three KOA
classification models.
( 2
min )
Positional encodings are employed to capture the high frequency information
of the encoded signals in implicit neural representation (INR). In this paper,
we propose a novel positional encoding method which improves the reconstruction
quality of the INR. The proposed embedding method is more advantageous for the
compact data representation because it has a greater number of frequency basis
than the existing methods. Our experiments shows that the proposed method
achieves significant gain in the rate-distortion performance without
introducing any additional complexity in the compression task and higher
reconstruction quality in novel view synthesis.
( 2
min )
Knee Osteoarthritis (KOA), a leading cause of disability worldwide, is
challenging to detect early due to subtle radiographic indicators. Diverse,
extensive datasets are needed but are challenging to compile because of
privacy, data collection limitations, and the progressive nature of KOA.
However, a model capable of projecting genuine radiographs into different OA
stages could augment data pools, enhance algorithm training, and offer
pre-emptive prognostic insights. In this study, we trained a CycleGAN model to
synthesize past and future stages of KOA on any genuine radiograph. The model
was validated using a Convolutional Neural Network that was deceived into
misclassifying disease stages in transformed images, demonstrating the
CycleGAN's ability to effectively transform disease characteristics forward or
backward in time. The model was particularly effective in synthesizing future
disease states and showed an exceptional ability to retroactively transition
late-stage radiographs to earlier stages by eliminating osteophytes and
expanding knee joint space, signature characteristics of None or Doubtful KOA.
The model's results signify a promising potential for enhancing diagnostic
models, data augmentation, and educational and prognostic usage in healthcare.
Nevertheless, further refinement, validation, and a broader evaluation process
encompassing both CNN-based assessments and expert medical feedback are
emphasized for future research and development.
( 2
min )
This paper proposes an algorithm that implements binary encoding of the
categorical features of neural network model input data, while also
implementing changes in the forward and backpropagation procedures in order to
achieve the property of having model weight changes, that result from the
neural network learning process for certain data instances of some feature
category, only affect the forward pass calculations for input data instances of
that same feature category, as it is in the case of utilising one-hot encoding
for categorical features.
( 2
min )
This paper explores the critical role of differentiation approaches for
data-driven differential equation discovery. Accurate derivatives of the input
data are essential for reliable algorithmic operation, particularly in
real-world scenarios where measurement quality is inevitably compromised. We
propose alternatives to the commonly used finite differences-based method,
notorious for its instability in the presence of noise, which can exacerbate
random errors in the data. Our analysis covers four distinct methods:
Savitzky-Golay filtering, spectral differentiation, smoothing based on
artificial neural networks, and the regularization of derivative variation. We
evaluate these methods in terms of applicability to problems, similar to the
real ones, and their ability to ensure the convergence of equation discovery
algorithms, providing valuable insights for robust modeling of real-world
processes.
( 2
min )
In this study, we present an investigation into the anisotropy dynamics and
intrinsic dimension of embeddings in transformer architectures, focusing on the
dichotomy between encoders and decoders. Our findings reveal that the
anisotropy profile in transformer decoders exhibits a distinct bell-shaped
curve, with the highest anisotropy concentrations in the middle layers. This
pattern diverges from the more uniformly distributed anisotropy observed in
encoders. In addition, we found that the intrinsic dimension of embeddings
increases in the initial phases of training, indicating an expansion into
higher-dimensional space. Which is then followed by a compression phase towards
the end of training with dimensionality decrease, suggesting a refinement into
more compact representations. Our results provide fresh insights to the
understanding of encoders and decoders embedding properties.
( 2
min )
Recent research indicates that frequent model communication stands as a major
bottleneck to the efficiency of decentralized machine learning (ML),
particularly for large-scale and over-parameterized neural networks (NNs). In
this paper, we introduce MALCOM-PSGD, a new decentralized ML algorithm that
strategically integrates gradient compression techniques with model
sparsification. MALCOM-PSGD leverages proximal stochastic gradient descent to
handle the non-smoothness resulting from the $\ell_1$ regularization in model
sparsification. Furthermore, we adapt vector source coding and dithering-based
quantization for compressed gradient communication of sparsified models. Our
analysis shows that decentralized proximal stochastic gradient descent with
compressed communication has a convergence rate of
$\mathcal{O}\left(\ln(t)/\sqrt{t}\right)$ assuming a diminishing learning rate
and where $t$ denotes the number of iterations. Numerical results verify our
theoretical findings and demonstrate that our method reduces communication
costs by approximately $75\%$ when compared to the state-of-the-art method.
( 2
min )
Advanced materials are needed to further next-generation technologies such as
quantum computing, carbon capture, and low-cost medical imaging. However,
advanced materials discovery is confounded by two fundamental challenges: the
challenge of a high-dimensional, complex materials search space and the
challenge of combining knowledge, i.e., data fusion across instruments and
labs. To overcome the first challenge, researchers employ knowledge of the
underlying material synthesis-structure-property relationship, as a material's
structure is often predictive of its functional property and vice versa. For
example, optimal materials often occur along composition-phase boundaries or
within specific phase regions. Additionally, knowledge of the
synthesis-structure-property relationship is fundamental to understanding
underlying physical mechanisms. However, quantifying the
synthesis-structure-property relationship requires overcoming the second
challenge. Researchers must merge knowledge gathered across instruments,
measurement modalities, and even laboratories. We present the
Synthesis-structure-property relAtionship coreGionalized lEarner (SAGE)
algorithm. A fully Bayesian algorithm that uses multimodal coregionalization to
merge knowledge across data sources to learn synthesis-structure-property
relationships.
( 2
min )
Deep neural networks have achieved significant success in the last decades,
but they are not well-calibrated and often produce unreliable predictions. A
large number of literature relies on uncertainty quantification to evaluate the
reliability of a learning model, which is particularly important for
applications of out-of-distribution (OOD) detection and misclassification
detection. We are interested in uncertainty quantification for interdependent
node-level classification. We start our analysis based on graph posterior
networks (GPNs) that optimize the uncertainty cross-entropy (UCE)-based loss
function. We describe the theoretical limitations of the widely-used UCE loss.
To alleviate the identified drawbacks, we propose a distance-based
regularization that encourages clustered OOD nodes to remain clustered in the
latent space. We conduct extensive comparison experiments on eight standard
datasets and demonstrate that the proposed regularization outperforms the
state-of-the-art in both OOD detection and misclassification detection.
( 2
min )
With the advances in computationally efficient artificial Intelligence (AI)
techniques and their numerous applications in our everyday life, there is a
pressing need to understand the computational details hidden in black box AI
techniques such as most popular machine learning and deep learning techniques;
through more detailed explanations. The origin of explainable AI (xAI) is
coined from these challenges and recently gained more attention by the
researchers by adding explainability comprehensively in traditional AI systems.
This leads to develop an appropriate framework for successful applications of
xAI in real life scenarios with respect to innovations, risk mitigation,
ethical issues and logical values to the users. In this book chapter, an
in-depth analysis of several xAI frameworks and methods including LIME (Local
Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive
exPlanations) are provided. Random Forest Classifier as black box AI is used on
a publicly available Diabetes symptoms dataset with LIME and SHAP for better
interpretations. The results obtained are interesting in terms of transparency,
valid and trustworthiness in diabetes disease prediction.
( 2
min )
We show that a constant-size constant-error coreset for polytope distance is
simple to maintain under merges of coresets. However, increasing the size
cannot improve the error bound significantly beyond that constant.
( 2
min )
This report explores the theory that explains the high sparsity phenomenon
\citep{tosato2023emergent} observed in the forward-forward algorithm
\citep{hinton2022forward}. The two theorems proposed predict the sparsity
changes of a single data point's activation in two cases: Theorem
\ref{theorem:1}: Decrease the goodness of the whole batch. Theorem
\ref{theorem:2}: Apply the complete forward forward algorithm to decrease the
goodness for negative data and increase the goodness for positive data. The
theory aligns well with the experiments tested on the MNIST dataset.
( 2
min )
We present an oracle-efficient relaxation for the adversarial contextual
bandits problem, where the contexts are sequentially drawn i.i.d from a known
distribution and the cost sequence is chosen by an online adversary. Our
algorithm has a regret bound of
$O(T^{\frac{2}{3}}(K\log(|\Pi|))^{\frac{1}{3}})$ and makes at most $O(K)$ calls
per round to an offline optimization oracle, where $K$ denotes the number of
actions, $T$ denotes the number of rounds and $\Pi$ denotes the set of
policies. This is the first result to improve the prior best bound of
$O((TK)^{\frac{2}{3}}(\log(|\Pi|))^{\frac{1}{3}})$ as obtained by Syrgkanis et
al. at NeurIPS 2016, and the first to match the original bound of Langford and
Zhang at NeurIPS 2007 which was obtained for the stochastic case.
( 2
min )
Bayesian bandit algorithms with approximate Bayesian inference have been
widely used in real-world applications. However, there is a large discrepancy
between the superior practical performance of these approaches and their
theoretical justification. Previous research only indicates a negative
theoretical result: Thompson sampling could have a worst-case linear regret
$\Omega(T)$ with a constant threshold on the inference error measured by one
$\alpha$-divergence. To bridge this gap, we propose an Enhanced Bayesian Upper
Confidence Bound (EBUCB) framework that can efficiently accommodate bandit
problems in the presence of approximate inference. Our theoretical analysis
demonstrates that for Bernoulli multi-armed bandits, EBUCB can achieve the
optimal regret order $O(\log T)$ if the inference error measured by two
different $\alpha$-divergences is less than a constant, regardless of how large
this constant is. To our best knowledge, our study provides the first
theoretical regret bound that is better than $o(T)$ in the setting of constant
approximate inference error. Furthermore, in concordance with the negative
results in previous studies, we show that only one bounded $\alpha$-divergence
is insufficient to guarantee a sub-linear regret.
( 3
min )
Real-time density estimation is ubiquitous in many applications, including
computer vision and signal processing. Kernel density estimation is arguably
one of the most commonly used density estimation techniques, and the use of
"sliding window" mechanism adapts kernel density estimators to dynamic
processes. In this paper, we derive the asymptotic mean integrated squared
error (AMISE) upper bound for the "sliding window" kernel density estimator.
This upper bound provides a principled guide to devise a novel estimator, which
we name the temporal adaptive kernel density estimator (TAKDE). Compared to
heuristic approaches for "sliding window" kernel density estimator, TAKDE is
theoretically optimal in terms of the worst-case AMISE. We provide numerical
experiments using synthetic and real-world datasets, showing that TAKDE
outperforms other state-of-the-art dynamic density estimators (including those
outside of kernel family). In particular, TAKDE achieves a superior test
log-likelihood with a smaller runtime.
( 2
min )
The consistency of the maximum likelihood estimator for mixtures of
elliptically-symmetric distributions for estimating its population version is
shown, where the underlying distribution $P$ is nonparametric and does not
necessarily belong to the class of mixtures on which the estimator is based. In
a situation where $P$ is a mixture of well enough separated but nonparametric
distributions it is shown that the components of the population version of the
estimator correspond to the well separated components of $P$. This provides
some theoretical justification for the use of such estimators for cluster
analysis in case that $P$ has well separated subpopulations even if these
subpopulations differ from what the mixture model assumes.
( 2
min )
Recent advances have shown that GP priors, or their finite realisations, can
be encoded using deep generative models such as variational autoencoders
(VAEs). These learned generators can serve as drop-in replacements for the
original priors during MCMC inference. While this approach enables efficient
inference, it loses information about the hyperparameters of the original
models, and consequently makes inference over hyperparameters impossible and
the learned priors indistinct. To overcome this limitation, we condition the
VAE on stochastic process hyperparameters. This allows the joint encoding of
hyperparameters with GP realizations and their subsequent estimation during
inference. Further, we demonstrate that our proposed method, PriorCVAE, is
agnostic to the nature of the models which it approximates, and can be used,
for instance, to encode solutions of ODEs. It provides a practical tool for
approximate inference and shows potential in real-life spatial and
spatiotemporal applications.
( 2
min )
Shapley values are among the most popular tools for explaining predictions of
blackbox machine learning models. However, their high computational cost
motivates the use of sampling approximations, inducing a considerable degree of
uncertainty. To stabilize these model explanations, we propose ControlSHAP, an
approach based on the Monte Carlo technique of control variates. Our
methodology is applicable to any machine learning model and requires virtually
no extra computation or modeling effort. On several high-dimensional datasets,
we find it can produce dramatic reductions in the Monte Carlo variability of
Shapley estimates.
( 2
min )
We investigate how shallow ReLU networks interpolate between known regions.
Our analysis shows that empirical risk minimizers converge to a minimum norm
interpolant as the number of data points and parameters tends to infinity when
a weight decay regularizer is penalized with a coefficient which vanishes at a
precise rate as the network width and the number of data points grow. With and
without explicit regularization, we numerically study the implicit bias of
common optimization algorithms towards known minimum norm interpolants.
( 2
min )
Since their inception, Variational Autoencoders (VAEs) have become central in
machine learning. Despite their widespread use, numerous questions regarding
their theoretical properties remain open. Using PAC-Bayesian theory, this work
develops statistical guarantees for VAEs. First, we derive the first
PAC-Bayesian bound for posterior distributions conditioned on individual
samples from the data-generating distribution. Then, we utilize this result to
develop generalization guarantees for the VAE's reconstruction loss, as well as
upper bounds on the distance between the input and the regenerated
distributions. More importantly, we provide upper bounds on the Wasserstein
distance between the input distribution and the distribution defined by the
VAE's generative model.
( 2
min )
We present the new Orthogonal Polynomials Approximation Algorithm (OPAA), a
parallelizable algorithm that solves two problems from a functional analytic
approach: first, it finds a smooth functional estimate of a density function,
whether it is normalized or not; second, the algorithm provides an estimate of
the normalizing weight. In the context of Bayesian inference, OPAA provides an
estimate of the posterior function as well as the normalizing weight, which is
also known as the evidence.
A core component of OPAA is a special transform of the square root of the
joint distribution into a special functional space of our construct. Through
this transform, the evidence is equated with the $L^2$ norm of the transformed
function, squared. Hence, the evidence can be estimated by the sum of squares
of the transform coefficients. The computations can be parallelized and
completed in one pass.
To compute the transform coefficients, OPAA proposes a new computational
scheme leveraging Gauss--Hermite quadrature in higher dimensions. Not only does
it avoid the potential high variance problem associated with random sampling
methods, it also enables one to speed up the computation by parallelization,
and significantly reduces the complexity by a vector decomposition.
( 2
min )
A classic inferential statistical problem is the goodness-of-fit (GOF) test.
Such a test can be challenging when the hypothesized parametric model has an
intractable likelihood and its distributional form is not available. Bayesian
methods for GOF can be appealing due to their ability to incorporate expert
knowledge through prior distributions.
However, standard Bayesian methods for this test often require strong
distributional assumptions on the data and their relevant parameters. To
address this issue, we propose a semi-Bayesian nonparametric (semi-BNP)
procedure in the context of the maximum mean discrepancy (MMD) measure that can
be applied to the GOF test. Our method introduces a novel Bayesian estimator
for the MMD, enabling the development of a measure-based hypothesis test for
intractable models. Through extensive experiments, we demonstrate that our
proposed test outperforms frequentist MMD-based methods by achieving a lower
false rejection and acceptance rate of the null hypothesis. Furthermore, we
showcase the versatility of our approach by embedding the proposed estimator
within a generative adversarial network (GAN) framework. It facilitates a
robust BNP learning approach as another significant application of our method.
With our BNP procedure, this new GAN approach can enhance sample diversity and
improve inferential accuracy compared to traditional techniques.
( 3
min )
Early-exit neural networks (EENNs) facilitate adaptive inference by producing
predictions at multiple stages of the forward pass. In safety-critical
applications, these predictions are only meaningful when complemented with
reliable uncertainty estimates. Yet, due to their sequential structure, an
EENN's uncertainty estimates should also be consistent: labels that are deemed
improbable at one exit should not reappear within the confidence interval / set
of later exits. We show that standard uncertainty quantification techniques,
like Bayesian methods or conformal prediction, can lead to inconsistency across
exits. We address this problem by applying anytime-valid confidence sequences
(AVCSs) to the exits of EENNs. By design, AVCSs maintain consistency across
exits. We examine the theoretical and practical challenges of applying AVCSs to
EENNs and empirically validate our approach on both regression and
classification tasks.
( 2
min )
Deep neural networks have achieved significant success in the last decades,
but they are not well-calibrated and often produce unreliable predictions. A
large number of literature relies on uncertainty quantification to evaluate the
reliability of a learning model, which is particularly important for
applications of out-of-distribution (OOD) detection and misclassification
detection. We are interested in uncertainty quantification for interdependent
node-level classification. We start our analysis based on graph posterior
networks (GPNs) that optimize the uncertainty cross-entropy (UCE)-based loss
function. We describe the theoretical limitations of the widely-used UCE loss.
To alleviate the identified drawbacks, we propose a distance-based
regularization that encourages clustered OOD nodes to remain clustered in the
latent space. We conduct extensive comparison experiments on eight standard
datasets and demonstrate that the proposed regularization outperforms the
state-of-the-art in both OOD detection and misclassification detection.
( 2
min )
NVIDIA today unveiled at SC23 the next wave of technologies that will lift scientific and industrial research centers worldwide to new levels of performance and energy efficiency. “NVIDIA hardware and software innovations are creating a new class of AI supercomputers,” said Ian Buck, vice president of the company’s high performance computing and hyperscale data center Read article >
( 9
min )
A widely acclaimed large language model for genomic data has demonstrated its ability to generate gene sequences that closely resemble real-world variants of SARS-CoV-2, the virus behind COVID-19. Called GenSLMs, the model, which last year won the Gordon Bell special prize for high performance computing-based COVID-19 research, was trained on a dataset of nucleotide sequences Read article >
( 6
min )
Michael Kuehn and Davide Vodola are taking to new heights work that’s pioneering quantum computing for the world’s largest chemical company. The BASF researchers are demonstrating how a quantum algorithm can see what no traditional simulation can — key attributes of NTA, a compound with applications that include removing toxic metals like iron from a Read article >
( 6
min )
Dozens of new supercomputers for scientific computing will soon hop online, powered by NVIDIA’s breakthrough GH200 Grace Hopper Superchip for giant-scale AI and high performance computing. The NVIDIA GH200 enables scientists and researchers to tackle the world’s most challenging problems by accelerating complex AI and HPC applications running terabytes of data. At the SC23 supercomputing Read article >
( 6
min )
At a basic level, Machine Learning (ML) technology learns from data to make predictions. Businesses use their data with an ML-powered personalization service to elevate their customer experience. This approach allows businesses to use data to derive actionable insights and help grow their revenue and brand loyalty. Amazon Personalize accelerates your digital transformation with ML, […]
( 8
min )
One of the most common applications of generative AI and large language models (LLMs) is answering questions based on a specific external knowledge corpus. Retrieval-Augmented Generation (RAG) is a popular technique for building question answering systems that use an external knowledge base. To learn more, refer to Build a powerful question answering bot with Amazon […]
( 7
min )
AI Weirdness: the strange side of machine learning
( 2
min )
In recent years, significant progress in generative AI has highlighted the
important role of physics-inspired models that utilize advanced mathematical
concepts based on fundamental physics principles to enhance artificial
intelligence capabilities. Among these models, those based on diffusion
equations have greatly improved image quality. This study aims to explore the
potential uses of Maxwell-Boltzmann equation, which forms the basis of the
kinetic theory of gases, and the Michaelis-Menten model in Marketing Mix
Modelling (MMM) applications. We propose incorporating these equations into
Hierarchical Bayesian models to analyse consumer behaviour in the context of
advertising. These equation sets excel in accurately describing the random
dynamics in complex systems like social interactions and consumer-advertising
interactions.
( 2
min )
Adjoint operators have been found to be effective in the exploration of CNN's
inner workings [1]. However, the previous no-bias assumption restricted its
generalization. We overcome the restriction via embedding input images into an
extended normed space that includes bias in all CNN layers as part of the
extended space and propose an adjoint-operator-based algorithm that maps
high-level weights back to the extended input space for reconstructing an
effective hypersurface. Such hypersurface can be computed for an arbitrary unit
in the CNN, and we prove that this reconstructed hypersurface, when multiplied
by the original input (through an inner product), will precisely replicate the
output value of each unit. We show experimental results based on the CIFAR-10
and CIFAR-100 data sets where the proposed approach achieves near 0 activation
value reconstruction error.
( 2
min )
We consider the straggler problem in decentralized learning over a logical
ring while preserving user data privacy. Especially, we extend the recently
proposed framework of differential privacy (DP) amplification by
decentralization by Cyffers and Bellet to include overall training
latency--comprising both computation and communication latency. Analytical
results on both the convergence speed and the DP level are derived for both a
skipping scheme (which ignores the stragglers after a timeout) and a baseline
scheme that waits for each node to finish before the training continues. A
trade-off between overall training latency, accuracy, and privacy,
parameterized by the timeout of the skipping scheme, is identified and
empirically validated for logistic regression on a real-world dataset and for
image classification using the MNIST and CIFAR-10 datasets.
( 2
min )
This paper integrates manifold learning techniques within a \emph{Gaussian
process upper confidence bound} algorithm to optimize an objective function on
a manifold. Our approach is motivated by applications where a full
representation of the manifold is not available and querying the objective is
expensive. We rely on a point cloud of manifold samples to define a graph
Gaussian process surrogate model for the objective. Query points are
sequentially chosen using the posterior distribution of the surrogate model
given all previous queries. We establish regret bounds in terms of the number
of queries and the size of the point cloud. Several numerical examples
complement the theory and illustrate the performance of our method.
( 2
min )
For a widely-studied data model and general loss and sample-hardening
functions we prove that the Supervised Contrastive Learning (SCL), Hard-SCL
(HSCL), and Unsupervised Contrastive Learning (UCL) risks are minimized by
representations that exhibit Neural Collapse (NC), i.e., the class means form
an Equianglular Tight Frame (ETF) and data from the same class are mapped to
the same representation. We also prove that for any representation mapping, the
HSCL and Hard-UCL (HUCL) risks are lower bounded by the corresponding SCL and
UCL risks. Although the optimality of ETF is known for SCL, albeit only for
InfoNCE loss, its optimality for HSCL and UCL under general loss and hardening
functions is novel. Moreover, our proofs are much simpler, compact, and
transparent. We empirically demonstrate, for the first time, that ADAM
optimization of HSCL and HUCL risks with random initialization and suitable
hardness levels can indeed converge to the NC geometry if we incorporate
unit-ball or unit-sphere feature normalization. Without incorporating hard
negatives or feature normalization, however, the representations learned via
ADAM suffer from dimensional collapse (DC) and fail to attain the NC geometry.
( 2
min )
Federated Learning is expected to provide strong privacy guarantees, as only
gradients or model parameters but no plain text training data is ever exchanged
either between the clients or between the clients and the central server. In
this paper, we challenge this claim by introducing a simple but still very
effective membership inference attack algorithm, which relies only on a single
training step. In contrast to the popular honest-but-curious model, we
investigate a framework with a dishonest central server. Our strategy is
applicable to models with ReLU activations and uses the properties of this
activation function to achieve perfect accuracy. Empirical evaluation on visual
classification tasks with MNIST, CIFAR10, CIFAR100 and CelebA datasets show
that our method provides perfect accuracy in identifying one sample in a
training set with thousands of samples. Occasional failures of our method lead
us to discover duplicate images in the CIFAR100 and CelebA datasets.
( 2
min )
In data-driven systems, data exploration is imperative for making real-time
decisions. However, big data is stored in massive databases that are difficult
to retrieve. Approximate Query Processing (AQP) is a technique for providing
approximate answers to aggregate queries based on a summary of the data
(synopsis) that closely replicates the behavior of the actual data, which can
be useful where an approximate answer to the queries would be acceptable in a
fraction of the real execution time. This study explores the novel utilization
of Generative Adversarial Networks (GANs) in the generation of tabular data
that can be employed in AQP for synopsis construction. We thoroughly
investigate the unique challenges posed by the synopsis construction process,
including maintaining data distribution characteristics, handling bounded
continuous and categorical data, and preserving semantic relationships and then
introduce the advancement of tabular GAN architectures that overcome these
challenges. Furthermore, we propose and validate a suite of statistical metrics
tailored for assessing the reliability of the GAN-generated synopses. Our
findings demonstrate that advanced GAN variations exhibit a promising capacity
to generate high-fidelity synopses, potentially transforming the efficiency and
effectiveness of AQP in data-driven systems.
( 2
min )
Self-supervised learning (SSL) for WiFi-based human activity recognition
(HAR) holds great promise due to its ability to address the challenge of
insufficient labeled data. However, directly transplanting SSL algorithms,
especially contrastive learning, originally designed for other domains to CSI
data, often fails to achieve the expected performance. We attribute this issue
to the inappropriate alignment criteria, which disrupt the semantic distance
consistency between the feature space and the input space. To address this
challenge, we introduce \textbf{A}ntenna \textbf{R}esponse \textbf{C}onsistency
(ARC) as a solution to define proper alignment criteria. ARC is designed to
retain semantic information from the input space while introducing robustness
to real-world noise. Moreover, we substantiate the effectiveness of ARC through
a comprehensive set of experiments, demonstrating its capability to enhance the
performance of self-supervised learning for WiFi-based HAR by achieving an
increase of over 5\% in accuracy in most cases and achieving a best accuracy of
94.97\%.
( 2
min )
With the development of trustworthy Federated Learning (FL), the requirement
of implementing right to be forgotten gives rise to the area of Federated
Unlearning (FU). Comparing to machine unlearning, a major challenge of FU lies
in the decentralized and privacy-preserving nature of FL, in which clients
jointly train a global model without sharing their raw data, making it
substantially more intricate to selectively unlearn specific information. In
that regard, many efforts have been made to tackle the challenges of FU and
have achieved significant progress. In this paper, we present a comprehensive
survey of FU. Specially, we provide the existing algorithms, objectives,
evaluation metrics, and identify some challenges of FU. By reviewing and
comparing some studies, we summarize them into a taxonomy for various schemes,
potential applications and future directions.
( 2
min )
Open-set recognition (OSR), the identification of novel categories, can be a
critical component when deploying classification models in real-world
applications. Recent work has shown that familiarity-based scoring rules such
as the Maximum Softmax Probability (MSP) or the Maximum Logit Score (MLS) are
strong baselines when the closed-set accuracy is high. However, one of the
potential weaknesses of familiarity-based OSR are adversarial attacks. Here, we
present gradient-based adversarial attacks on familiarity scores for both types
of attacks, False Familiarity and False Novelty attacks, and evaluate their
effectiveness in informed and uninformed settings on TinyImageNet.
( 2
min )
We prove an upper bound on the covering number of real algebraic varieties,
images of polynomial maps and semialgebraic sets. The bound remarkably improves
the best known bound by Yomdin-Comte, and its proof is much more
straightforward. As a consequence, our result gives a bound on volume of the
tubular neighborhood of a real variety, improving the results by Lotz and
Basu-Lerario. We apply our theory to three main application domains. Firstly,
we derive a near-optimal bound on the covering number of low rank CP tensors.
Secondly, we prove a bound on the sketching dimension for (general) polynomial
optimization problems. Lastly, we deduce generalization error bounds for deep
neural networks with rational or ReLU activations, improving or matching the
best known results in the literature.
( 2
min )
In this work, we present Transformer-based Powered Descent Guidance (T-PDG),
a scalable algorithm for reducing the computational complexity of the direct
optimization formulation of the spacecraft powered descent guidance problem.
T-PDG uses data from prior runs of trajectory optimization algorithms to train
a transformer neural network, which accurately predicts the relationship
between problem parameters and the globally optimal solution for the powered
descent guidance problem. The solution is encoded as the set of tight
constraints corresponding to the constrained minimum-cost trajectory and the
optimal final time of landing. By leveraging the attention mechanism of
transformer neural networks, large sequences of time series data can be
accurately predicted when given only the spacecraft state and landing site
parameters. When applied to the real problem of Mars powered descent guidance,
T-PDG reduces the time for computing the 3 degree of freedom fuel-optimal
trajectory, when compared to lossless convexification, from an order of 1-8
seconds to less than 500 milliseconds. A safe and optimal solution is
guaranteed by including a feasibility check in T-PDG before returning the final
trajectory.
( 2
min )
In this paper, we address the limitations of the common data annotation and
training methods for objective single-label classification tasks. Typically,
when annotating such tasks annotators are only asked to provide a single label
for each sample and annotator disagreement is discarded when a final hard label
is decided through majority voting. We challenge this traditional approach,
acknowledging that determining the appropriate label can be difficult due to
the ambiguity and lack of context in the data samples. Rather than discarding
the information from such ambiguous annotations, our soft label method makes
use of them for training. Our findings indicate that additional annotator
information, such as confidence, secondary label and disagreement, can be used
to effectively generate soft labels. Training classifiers with these soft
labels then leads to improved performance and calibration on the hard label
test set.
( 2
min )
The growing use of digital communication platforms has given rise to various
criminal activities, such as grooming and drug dealing, which pose significant
challenges to law enforcement and forensic experts. This paper presents a
supervised keyphrase extraction approach to detect relevant information in
high-volume chat logs involving grooming and drug dealing for forensic
analysis. The proposed method, JointKPE++, builds upon the JointKPE keyphrase
extractor by employing improvements to handle longer texts effectively. We
evaluate JointKPE++ using BERT-based pre-trained models on grooming and drug
dealing datasets, including BERT, RoBERTa, SpanBERT, and BERTimbau. The results
show significant improvements over traditional approaches and demonstrate the
potential for JointKPE++ to aid forensic experts in efficiently detecting
keyphrases related to criminal activities.
( 2
min )
We consider an unknown multivariate function representing a system-such as a
complex numerical simulator-taking both deterministic and uncertain inputs. Our
objective is to estimate the set of deterministic inputs leading to outputs
whose probability (with respect to the distribution of the uncertain inputs) of
belonging to a given set is less than a given threshold. This problem, which we
call Quantile Set Inversion (QSI), occurs for instance in the context of robust
(reliability-based) optimization problems, when looking for the set of
solutions that satisfy the constraints with sufficiently large probability. To
solve the QSI problem, we propose a Bayesian strategy based on Gaussian process
modeling and the Stepwise Uncertainty Reduction (SUR) principle, to
sequentially choose the points at which the function should be evaluated to
efficiently approximate the set of interest. We illustrate the performance and
interest of the proposed SUR strategy through several numerical experiments.
( 2
min )
Generalized self-concordance is a key property present in the objective
function of many important learning problems. We establish the convergence rate
of a simple Frank-Wolfe variant that uses the open-loop step size strategy
$\gamma_t = 2/(t+2)$, obtaining a $\mathcal{O}(1/t)$ convergence rate for this
class of functions in terms of primal gap and Frank-Wolfe gap, where $t$ is the
iteration count. This avoids the use of second-order information or the need to
estimate local smoothness parameters of previous work. We also show improved
convergence rates for various common cases, e.g., when the feasible region
under consideration is uniformly convex or polyhedral.
( 2
min )
This paper integrates manifold learning techniques within a \emph{Gaussian
process upper confidence bound} algorithm to optimize an objective function on
a manifold. Our approach is motivated by applications where a full
representation of the manifold is not available and querying the objective is
expensive. We rely on a point cloud of manifold samples to define a graph
Gaussian process surrogate model for the objective. Query points are
sequentially chosen using the posterior distribution of the surrogate model
given all previous queries. We establish regret bounds in terms of the number
of queries and the size of the point cloud. Several numerical examples
complement the theory and illustrate the performance of our method.
( 2
min )
The aim of this study is to define importance of predictors for black box
machine learning methods, where the prediction function can be complex and
cannot be represented by statistical parameters. In this paper we defined a
``Generalized Variable Importance Metric (GVIM)'' using the true conditional
expectation function for a continuous or a binary response variable. We further
showed that the defined GVIM can be represented as a function of the
Conditional Average Treatment Effect (CATE) for multinomial and continuous
predictors. Then we propose how the metric can be estimated using using any
machine learning models. Finally using simulations we evaluated the properties
of the estimator when estimated from XGBoost, Random Forest and a mis-specified
generalized additive model.
( 2
min )
When systems use data-based models that are based on machine learning (ML),
errors in their results cannot be ruled out. This is particularly critical if
it remains unclear to the user how these models arrived at their decisions and
if errors can have safety-relevant consequences, as is often the case in the
medical field. In such cases, the use of dependable methods to quantify the
uncertainty remaining in a result allows the user to make an informed decision
about further usage and draw possible conclusions based on a given result. This
paper demonstrates the applicability and practical utility of the Uncertainty
Wrapper using flow cytometry as an application from the medical field that can
benefit from the use of ML models in conjunction with dependable and
transparent uncertainty quantification.
( 2
min )
In the recent past, using machine learning (ML) to make predictions, especially for data in the form of text and images, required extensive ML knowledge for creating and tuning of deep learning models. Today, ML has become more accessible to any user who wants to use ML models to generate business value. With Amazon SageMaker […]
( 7
min )
Creating high-performance machine learning (ML) solutions relies on exploring and optimizing training parameters, also known as hyperparameters. Hyperparameters are the knobs and levers that we use to adjust the training process, such as learning rate, batch size, regularization strength, and others, depending on the specific model and task at hand. Exploring hyperparameters involves systematically varying […]
( 20
min )
Thanks to a viral trend sweeping social media, we now know some men think about the Roman Empire every day. And thanks to Luke Farritor, a 21-year-old computer science undergrad at the University of Nebraska-Lincoln, and like-minded AI enthusiasts, there might soon be a lot more to think about. Blending a passion for history with Read article >
( 6
min )
Thanks to a viral trend sweeping social media, we now know some men think about the Roman Empire every day. And thanks to Luke Farritor, a 21-year-old computer science undergrad at the University of Nebraska-Lincoln, and like-minded AI enthusiasts, there might soon be a lot more to think about. Blending a passion for history with Read article >
( 6
min )
Fetal brain MRI is becoming an increasingly relevant complement to
neurosonography for perinatal diagnosis, allowing fundamental insights into
fetal brain development throughout gestation. However, uncontrolled fetal
motion and heterogeneity in acquisition protocols lead to data of variable
quality, potentially biasing the outcome of subsequent studies. We present
FetMRQC, an open-source machine-learning framework for automated image quality
assessment and quality control that is robust to domain shifts induced by the
heterogeneity of clinical data. FetMRQC extracts an ensemble of quality metrics
from unprocessed anatomical MRI and combines them to predict experts' ratings
using random forests. We validate our framework on a pioneeringly large and
diverse dataset of more than 1600 manually rated fetal brain T2-weighted images
from four clinical centers and 13 different scanners. Our study shows that
FetMRQC's predictions generalize well to unseen data while being interpretable.
FetMRQC is a step towards more robust fetal brain neuroimaging, which has the
potential to shed new insights on the developing human brain.
( 3
min )
Deep learning has taken by storm all fields involved in data analysis,
including remote sensing for Earth observation. However, despite significant
advances in terms of performance, its lack of explainability and
interpretability, inherent to neural networks in general since their inception,
remains a major source of criticism. Hence it comes as no surprise that the
expansion of deep learning methods in remote sensing is being accompanied by
increasingly intensive efforts oriented towards addressing this drawback
through the exploration of a wide spectrum of Explainable Artificial
Intelligence techniques. This chapter, organized according to prominent Earth
observation application fields, presents a panorama of the state-of-the-art in
explainable remote sensing image analysis.
( 2
min )
We introduce the text-to-instrument task, which aims at generating
sample-based musical instruments based on textual prompts. Accordingly, we
propose InstrumentGen, a model that extends a text-prompted generative audio
framework to condition on instrument family, source type, pitch (across an
88-key spectrum), velocity, and a joint text/audio embedding. Furthermore, we
present a differentiable loss function to evaluate the intra-instrument timbral
consistency of sample-based instruments. Our results establish a foundational
text-to-instrument baseline, extending research in the domain of automatic
sample-based instrument generation.
( 2
min )
Recent AI research has significantly reduced the barriers to apply AI, but
the process of setting up the necessary tools and frameworks can still be a
challenge. While AI-as-a-Service platforms have emerged to simplify the
training and deployment of AI models, they still fall short of achieving true
democratization of AI. In this paper, we aim to address this gap by comparing
several popular AI-as-a-Service platforms and identifying the key requirements
for a platform that can achieve true democratization of AI. Our analysis
highlights the need for self-hosting options, high scalability, and openness.
To address these requirements, we propose our approach: the "Open Space for
Machine Learning" platform. Our platform is built on cutting-edge technologies
such as Kubernetes, Kubeflow Pipelines, and Ludwig, enabling us to overcome the
challenges of democratizing AI. We argue that our approach is more
comprehensive and effective in meeting the requirements of democratizing AI
than existing AI-as-a-Service platforms.
( 2
min )
The electrocardiogram (ECG) is a dependable instrument for assessing the
function of the cardiovascular system. There has recently been much emphasis on
precisely classifying ECGs. While ECG situations have numerous similarities,
little attention has been paid to categorizing ECGs using graph neural
networks. In this study, we offer three distinct techniques for classifying
heartbeats using deep graph neural networks to classify the ECG signals
accurately. We suggest using different methods to extract topological features
from the ECG signal and then using a branch of the graph neural network named
graph isomorphism network for classifying the ECGs. On the PTB Diagnostics data
set, we tested the three proposed techniques. According to the findings, the
three proposed techniques are capable of making arrhythmia classification
predictions with the accuracy of 99.38, 98.76, and 91.93 percent, respectively.
( 2
min )
This study presents an innovative method for predicting the market value of
professional soccer players using explainable machine learning models. Using a
dataset curated from the FIFA website, we employ an ensemble machine learning
approach coupled with Shapley Additive exPlanations (SHAP) to provide detailed
explanations of the models' predictions. The GBDT model achieves the highest
mean R-Squared (0.8780) and the lowest mean Root Mean Squared Error
(3,221,632.175), indicating its superior performance among the evaluated
models. Our analysis reveals that specific skills such as ball control, short
passing, finishing, interceptions, dribbling, and tackling are paramount within
the skill dimension, whereas sprint speed and acceleration are critical in the
fitness dimension, and reactions are preeminent in the cognitive dimension. Our
results offer a more accurate, objective, and consistent framework for market
value estimation, presenting useful insights for managerial decisions in player
transfers.
( 2
min )
Black-box variational inference performance is sometimes hindered by the use
of gradient estimators with high variance. This variance comes from two sources
of randomness: Data subsampling and Monte Carlo sampling. While existing
control variates only address Monte Carlo noise, and incremental gradient
methods typically only address data subsampling, we propose a new "joint"
control variate that jointly reduces variance from both sources of noise. This
significantly reduces gradient variance, leading to faster optimization in
several applications.
( 2
min )
Contrastive learning has recently emerged as a promising approach for
learning data representations that discover and disentangle the explanatory
factors of the data. Previous analyses of such approaches have largely focused
on individual contrastive losses, such as noise-contrastive estimation (NCE)
and InfoNCE, and rely on specific assumptions about the data generating
process. This paper extends the theoretical guarantees for disentanglement to a
broader family of contrastive methods, while also relaxing the assumptions
about the data distribution. Specifically, we prove identifiability of the true
latents for four contrastive losses studied in this paper, without imposing
common independence assumptions. The theoretical findings are validated on
several benchmark datasets. Finally, practical limitations of these methods are
also investigated.
( 2
min )
In this paper, we develop data-dependent and algorithm-dependent
generalization bounds for transductive learning algorithms in the context of
information theory for the first time. We show that the generalization gap of
transductive learning algorithms can be bounded by the mutual information
between training labels and hypothesis. By innovatively proposing the concept
of transductive supersamples, we go beyond the inductive learning setting and
establish upper bounds in terms of various information measures. Furthermore,
we derive novel PAC-Bayesian bounds and build the connection between
generalization and loss landscape flatness under the transductive learning
setting. Finally, we present the upper bounds for adaptive optimization
algorithms and demonstrate the applications of results on semi-supervised
learning and graph learning scenarios. Our theoretic results are validated on
both synthetic and real-world datasets.
( 2
min )
The rising popularity of artificial intelligence in healthcare is
highlighting the problem that a computational model achieving super-human
clinical performance at its training sites may perform substantially worse at
new sites. In this perspective, we present common sources for this failure to
transport, which we divide into sources under the control of the experimenter
and sources inherent to the clinical data-generating process. Of the inherent
sources we look a little deeper into site-specific clinical practices that can
affect the data distribution, and propose a potential solution intended to
isolate the imprint of those practices on the data from the patterns of disease
cause and effect that are the usual target of clinical models.
( 2
min )
In this paper, we present new high-probability PAC-Bayes bounds for different
types of losses. Firstly, for losses with a bounded range, we recover a
strengthened version of Catoni's bound that holds uniformly for all parameter
values. This leads to new fast rate and mixed rate bounds that are
interpretable and tighter than previous bounds in the literature. In
particular, the fast rate bound is equivalent to the Seeger--Langford bound.
Secondly, for losses with more general tail behaviors, we introduce two new
parameter-free bounds: a PAC-Bayes Chernoff analogue when the loss' cumulative
generating function is bounded, and a bound when the loss' second moment is
bounded. These two bounds are obtained using a new technique based on a
discretization of the space of possible events for the "in probability"
parameter optimization problem. This technique is both simpler and more general
than previous approaches optimizing over a grid on the parameters' space.
Finally, we extend all previous results to anytime-valid bounds using a simple
technique applicable to any existing bound.
( 2
min )
Neural networks have shown remarkable performance in computer vision, but
their deployment in numerous scientific and technical fields is challenging due
to their black-box nature. Scientists and practitioners need to evaluate the
reliability of a decision, i.e., to know simultaneously if a model relies on
the relevant features and whether these features are robust to image
corruptions. Existing attribution methods aim to provide human-understandable
explanations by highlighting important regions in the image domain, but fail to
fully characterize a decision process's reliability. To bridge this gap, we
introduce the Wavelet sCale Attribution Method (WCAM), a generalization of
attribution from the pixel domain to the space-scale domain using wavelet
transforms. Attribution in the wavelet domain reveals where and on what scales
the model focuses, thus enabling us to assess whether a decision is reliable.
Our code is accessible here:
\url{https://github.com/gabrielkasmi/spectral-attribution}.
( 2
min )
Good data stewardship requires removal of data at the request of the data's
owner. This raises the question if and how a trained machine-learning model,
which implicitly stores information about its training data, should be affected
by such a removal request. Is it possible to "remove" data from a
machine-learning model? We study this problem by defining certified removal: a
very strong theoretical guarantee that a model from which data is removed
cannot be distinguished from a model that never observed the data to begin
with. We develop a certified-removal mechanism for linear classifiers and
empirically study learning settings in which this mechanism is practical.
( 2
min )
In this paper, we propose to develop a new Cram\'er-Rao Bound (CRB) when the
parameter to estimate lies in a manifold and follows a prior distribution. This
derivation leads to a natural inequality between an error criteria based on
geometrical properties and this new bound. This main contribution is
illustrated in the problem of covariance estimation when the data follow a
Gaussian distribution and the prior distribution is an inverse Wishart.
Numerical simulation shows new results where the proposed CRB allows to exhibit
interesting properties of the MAP estimator which are not observed with the
classical Bayesian CRB.
( 2
min )
This paper establishes the nearly optimal rate of approximation for deep
neural networks (DNNs) when applied to Korobov functions, effectively
overcoming the curse of dimensionality. The approximation results presented in
this paper are measured with respect to $L_p$ norms and $H^1$ norms. Our
achieved approximation rate demonstrates a remarkable "super-convergence" rate,
outperforming traditional methods and any continuous function approximator.
These results are non-asymptotic, providing error bounds that consider both the
width and depth of the networks simultaneously.
( 2
min )
Happ and Greven (2018) developed a methodology for principal components
analysis of multivariate functional data for data observed on different
dimensional domains. Their approach relies on an estimation of univariate
functional principal components for each univariate functional feature. In this
paper, we present extensive simulations to investigate choosing the number of
principal components to retain. We show empirically that the conventional
approach of using a percentage of variance explained threshold for each
univariate functional feature may be unreliable when aiming to explain an
overall percentage of variance in the multivariate functional data, and thus we
advise practitioners to be careful when using it.
( 2
min )
Building out a machine learning operations (MLOps) platform in the rapidly evolving landscape of artificial intelligence (AI) and machine learning (ML) for organizations is essential for seamlessly bridging the gap between data science experimentation and deployment while meeting the requirements around model performance, security, and compliance. In order to fulfill regulatory and compliance requirements, the […]
( 17
min )
Generative AI models for coding companions are mostly trained on publicly available source code and natural language text. While the large size of the training corpus enables the models to generate code for commonly used functionality, these models are unaware of code in private repositories and the associated coding styles that are enforced when developing […]
( 11
min )
Wield the blade and embrace the way of the samurai for some thrilling action — Onimusha: Warlords comes to GeForce NOW this week. Members can experience feudal Japan in this hack-and-slash adventure game in the cloud. It’s part of an action-packed GFN Thursday, with 16 more games joining the cloud gaming platform’s library. Forging Destinies Read article >
( 5
min )
Wield the blade and embrace the way of the samurai for some thrilling action — Onimusha: Warlords comes to GeForce NOW this week. Members can experience feudal Japan in this hack-and-slash adventure game in the cloud. It’s part of an action-packed GFN Thursday, with 16 more games joining the cloud gaming platform’s library. Forging Destinies Read article >
( 5
min )
Working together to create open-source and private datasets for AI training.
( 2
min )
It is commonly recognized that the expressiveness of deep neural networks is
contingent upon a range of factors, encompassing their depth, width, and other
relevant considerations. Currently, the practical performance of the majority
of deep neural networks remains uncertain. For ReLU (Rectified Linear Unit)
networks with piecewise linear activations, the number of linear convex regions
serves as a natural metric to gauge the network's expressivity. In this paper,
we count the number of linear convex regions in deep neural networks based on
ReLU. In particular, we prove that for any one-dimensional input, there exists
a minimum threshold for the number of neurons required to express it. We also
empirically observe that for the same network, intricate inputs hinder its
capacity to express linear regions. Furthermore, we unveil the iterative
refinement process of decision boundaries in ReLU networks during training. We
aspire for our research to serve as an inspiration for network optimization
endeavors and aids in the exploration and analysis of the behaviors exhibited
by deep networks.
( 2
min )
Minimum Description Length (MDL) estimators, using two-part codes for
universal coding, are analyzed. For general parametric families under certain
regularity conditions, we introduce a two-part code whose regret is close to
the minimax regret, where regret of a code with respect to a target family M is
the difference between the code length of the code and the ideal code length
achieved by an element in M. This is a generalization of the result for
exponential families by Gr\"unwald. Our code is constructed by using an
augmented structure of M with a bundle of local exponential families for data
description, which is not needed for exponential families. This result gives a
tight upper bound on risk and loss of the MDL estimators based on the theory
introduced by Barron and Cover in 1991. Further, we show that we can apply the
result to mixture families, which are a typical example of non-exponential
families.
( 2
min )
The diffusion model has shown remarkable success in computer vision, but it
remains unclear whether the ODE-based probability flow or the SDE-based
diffusion model is more superior and under what circumstances. Comparing the
two is challenging due to dependencies on data distributions, score training,
and other numerical issues. In this paper, we study the problem mathematically
for two limiting scenarios: the zero diffusion (ODE) case and the large
diffusion case. We first introduce a pulse-shape error to perturb the score
function and analyze error accumulation of sampling quality, followed by a
thorough analysis for generalization to arbitrary error. Our findings indicate
that when the perturbation occurs at the end of the generative process, the ODE
model outperforms the SDE model with a large diffusion coefficient. However,
when the perturbation occurs earlier, the SDE model outperforms the ODE model,
and we demonstrate that the error of sample generation due to such a
pulse-shape perturbation is exponentially suppressed as the diffusion term's
magnitude increases to infinity. Numerical validation of this phenomenon is
provided using Gaussian, Gaussian mixture, and Swiss roll distribution, as well
as realistic datasets like MNIST and CIFAR-10.
( 2
min )
Accurate detection of human presence in indoor environments is important for
various applications, such as energy management and security. In this paper, we
propose a novel system for human presence detection using the channel state
information (CSI) of WiFi signals. Our system named attention-enhanced deep
learning for presence detection (ALPD) employs an attention mechanism to
automatically select informative subcarriers from the CSI data and a
bidirectional long short-term memory (LSTM) network to capture temporal
dependencies in CSI. Additionally, we utilize a static feature to improve the
accuracy of human presence detection in static states. We evaluate the proposed
ALPD system by deploying a pair of WiFi access points (APs) for collecting CSI
dataset, which is further compared with several benchmarks. The results
demonstrate that our ALPD system outperforms the benchmarks in terms of
accuracy, especially in the presence of interference. Moreover, bidirectional
transmission data is beneficial to training improving stability and accuracy,
as well as reducing the costs of data collection for training. Overall, our
proposed ALPD system shows promising results for human presence detection using
WiFi CSI signals.
( 2
min )
We consider two popular approaches to Knowledge Graph Completion (KGC):
textual models that rely on textual entity descriptions, and structure-based
models that exploit the connectivity structure of the Knowledge Graph (KG).
Preliminary experiments show that these approaches have complementary
strengths: structure-based models perform well when the gold answer is easily
reachable from the query head in the KG, while textual models exploit
descriptions to give good performance even when the gold answer is not
reachable. In response, we explore ensembling as a way of combining the best of
both approaches. We propose a novel method for learning query-dependent
ensemble weights by using the distributions of scores assigned by individual
models to all candidate entities. Our ensemble baseline achieves
state-of-the-art results on three standard KGC datasets, with up to 6.8 pt MRR
and 8.3 pt Hits@1 gains over best individual models.
( 2
min )
There is currently a large gap in performance between the statistically
rigorous methods like linear regression or additive splines and the powerful
deep methods using neural networks. Previous works attempting to close this gap
have failed to fully investigate the exponentially growing number of feature
combinations which deep networks consider automatically during training. In
this work, we develop a tractable selection algorithm to efficiently identify
the necessary feature combinations by leveraging techniques in feature
interaction detection. Our proposed Sparse Interaction Additive Networks (SIAN)
construct a bridge from these simple and interpretable models to fully
connected neural networks. SIAN achieves competitive performance against
state-of-the-art methods across multiple large-scale tabular datasets and
consistently finds an optimal tradeoff between the modeling capacity of neural
networks and the generalizability of simpler methods.
( 2
min )
Large Language Models (LLMs) are huge artificial neural networks which
primarily serve to generate text, but also provide a very sophisticated
probabilistic model of language use. Since generating a semantically consistent
text requires a form of effective memory, we investigate the memory properties
of LLMs and find surprising similarities with key characteristics of human
memory. This result strongly suggests that the biological features of human
memory leave an imprint on the way that we structure our textual narratives.
( 2
min )
We study differentially private stochastic convex optimization (DP-SCO) under
user-level privacy, where each user may hold multiple data items. Existing work
for user-level DP-SCO either requires super-polynomial runtime [Ghazi et al.
(2023)] or requires the number of users to grow polynomially with the
dimensionality of the problem with additional strict assumptions [Bassily et
al. (2023)]. We develop new algorithms for user-level DP-SCO that obtain
optimal rates for both convex and strongly convex functions in polynomial time
and require the number of users to grow only logarithmically in the dimension.
Moreover, our algorithms are the first to obtain optimal rates for non-smooth
functions in polynomial time. These algorithms are based on multiple-pass
DP-SGD, combined with a novel private mean estimation procedure for
concentrated data, which applies an outlier removal step before estimating the
mean of the gradients.
( 2
min )
In this paper, we present the results of the NeurIPS-2022 Neural MMO
Challenge, which attracted 500 participants and received over 1,600
submissions. Like the previous IJCAI-2022 Neural MMO Challenge, it involved
agents from 16 populations surviving in procedurally generated worlds by
collecting resources and defeating opponents. This year's competition runs on
the latest v1.6 Neural MMO, which introduces new equipment, combat, trading,
and a better scoring system. These elements combine to pose additional
robustness and generalization challenges not present in previous competitions.
This paper summarizes the design and results of the challenge, explores the
potential of this environment as a benchmark for learning methods, and presents
some practical reinforcement learning training approaches for complex tasks
with sparse rewards. Additionally, we have open-sourced our baselines,
including environment wrappers, benchmarks, and visualization tools for future
research.
( 2
min )
Discriminatively trained, deterministic neural networks are the de facto
choice for classification problems. However, even though they achieve
state-of-the-art results on in-domain test sets, they tend to be overconfident
on out-of-distribution (OOD) data. For instance, ReLU networks -- a popular
class of neural network architectures -- have been shown to almost always yield
high confidence predictions when the test data are far away from the training
set, even when they are trained with OOD data. We overcome this problem by
adding a term to the output of the neural network that corresponds to the logit
of an extra class, that we design to dominate the logits of the original
classes as we move away from the training data.This technique provably prevents
arbitrarily high confidence on far-away test data while maintaining a simple
discriminative point-estimate training. Evaluation on various benchmarks
demonstrates strong performance against competitive baselines on both far-away
and realistic OOD data.
( 2
min )
Federated learning (FL) has shown promising potential in safeguarding data
privacy in healthcare collaborations. While the term "FL" was originally coined
by the engineering community, the statistical field has also explored similar
privacy-preserving algorithms. Statistical FL algorithms, however, remain
considerably less recognized than their engineering counterparts. Our goal was
to bridge the gap by presenting the first comprehensive comparison of FL
frameworks from both engineering and statistical domains. We evaluated five FL
frameworks using both simulated and real-world data. The results indicate that
statistical FL algorithms yield less biased point estimates for model
coefficients and offer convenient confidence interval estimations. In contrast,
engineering-based methods tend to generate more accurate predictions, sometimes
surpassing central pooled and statistical FL models. This study underscores the
relative strengths and weaknesses of both types of methods, emphasizing the
need for increased awareness and their integration in future FL applications.
( 2
min )
In this paper, neural network approximation methods are developed for
elliptic partial differential equations with multi-frequency solutions. Neural
network work approximation methods have advantages over classical approaches in
that they can be applied without much concerns on the form of the differential
equations or the shape or dimension of the problem domain. When applied to
problems with multi-frequency solutions, the performance and accuracy of neural
network approximation methods are strongly affected by the contrast of the
high- and low-frequency parts in the solutions. To address this issue, domain
scaling and residual correction methods are proposed. The efficiency and
accuracy of the proposed methods are demonstrated for multi-frequency model
problems.
( 2
min )
Research in scientific disciplines evolves, often rapidly, over time with the
emergence of novel methodologies and their associated terminologies. While
methodologies themselves being conceptual in nature and rather difficult to
automatically extract and characterise, in this paper, we seek to develop
supervised models for automatic extraction of the names of the various
constituents of a methodology, e.g., `R-CNN', `ELMo' etc. The main research
challenge for this task is effectively modeling the contexts around these
methodology component names in a few-shot or even a zero-shot setting. The main
contributions of this paper towards effectively identifying new evolving
scientific methodology names are as follows: i) we propose a factored approach
to sequence modeling, which leverages a broad-level category information of
methodology domains, e.g., `NLP', `RL' etc.; ii) to demonstrate the feasibility
of our proposed approach of identifying methodology component names under a
practical setting of fast evolving AI literature, we conduct experiments
following a simulated chronological setup (newer methodologies not seen during
the training process); iii) our experiments demonstrate that the factored
approach outperforms state-of-the-art baselines by margins of up to 9.257\% for
the methodology extraction task with the few-shot setup.
( 2
min )
We present a new high-level synthesis methodology for using large language
model tools to generate hardware designs. The methodology uses exclusively
open-source tools excluding the large language model. As a case study, we use
our methodology to generate a permuted congruential random number generator
design with a wishbone interface. We verify the functionality and quality of
the random number generator design using large language model-generated
simulations and the Dieharder randomness test suite. We document all the large
language model chat logs, Python scripts, Verilog scripts, and simulation
results used in the case study. We believe that our method of hardware design
generation coupled with the open source silicon 130 nm design tools will
revolutionize application-specific integrated circuit design. Our methodology
significantly lowers the bar to entry when building domain-specific computing
accelerators for the Internet of Things and proof of concept prototypes for
later fabrication in more modern process nodes.
( 2
min )
Federated learning (FL) is an emerging paradigm for training deep neural
networks (DNNs) in distributed manners. Current FL approaches all suffer from
high communication overhead and information leakage. In this work, we present a
federated learning algorithm based on evolution strategies (FedES), a
zeroth-order training method. Instead of transmitting model parameters, FedES
only communicates loss values, and thus has very low communication overhead.
Moreover, a third party is unable to estimate gradients without knowing the
pre-shared seed, which protects data privacy. Experimental results demonstrate
FedES can achieve the above benefits while keeping convergence performance the
same as that with back propagation methods.
( 2
min )
One of the most promising developments in computer vision in recent years is
the use of generative neural networks for functionality condition-based 3D
design reconstruction and generation. Here, neural networks learn dependencies
between functionalities and a geometry in a very effective way. For a neural
network the functionalities are translated in conditions to a certain geometry.
But the more conditions the design generation needs to reflect, the more
difficult it is to learn clear dependencies. This leads to a multi criteria
design problem due various conditions, which are not considered in the neural
network structure so far.
In this paper, we address this multi-criteria challenge for a 3D design use
case related to an unmanned aerial vehicle (UAV) motor mount. We generate
10,000 abstract 3D designs and subject them all to simulations for three
physical disciplines: mechanics, thermodynamics, and aerodynamics. Then, we
train a Conditional Variational Autoencoder (CVAE) using the geometry and
corresponding multicriteria functional constraints as input. We use our trained
CVAE as well as the Marching cubes algorithm to generate meshes for simulation
based evaluation. The results are then evaluated with the generated UAV
designs. Subsequently, we demonstrate the ability to generate optimized designs
under self-defined functionality conditions using the trained neural network.
( 3
min )
Consistency-based diagnosis is an established approach to diagnose technical
applications, but suffers from significant modeling efforts, especially for
dynamic multi-modal time series. Machine learning seems to be an obvious
solution, which becomes less obvious when looking at details: Which notion of
consistency can be used? If logical calculi are still to be used, how can
dynamic time series be transferred into the discrete world?
This paper presents the methodology Discret2Di for automated learning of
logical expressions for consistency-based diagnosis. While these logical
calculi have advantages by providing a clear notion of consistency, they have
the key problem of relying on a discretization of the dynamic system. The
solution presented combines machine learning from both the time series and the
symbolic domain to automate the learning of logical rules for consistency-based
diagnosis.
( 2
min )
The adoption of diagnosis and prognostic algorithms in healthcare has led to
concerns about the perpetuation of bias against disadvantaged groups of
individuals. Deep learning methods to detect and mitigate bias have revolved
around modifying models, optimization strategies, and threshold calibration
with varying levels of success. Here, we generate a data-centric,
model-agnostic, task-agnostic approach to evaluate dataset bias by
investigating the relationship between how easily different groups are learned
at small sample sizes (AEquity). We then apply a systematic analysis of AEq
values across subpopulations to identify and mitigate manifestations of racial
bias in two known cases in healthcare - Chest X-rays diagnosis with deep
convolutional neural networks and healthcare utilization prediction with
multivariate logistic regression. AEq is a novel and broadly applicable metric
that can be applied to advance equity by diagnosing and remediating bias in
healthcare datasets.
( 2
min )
Visualization tools can help synthetic biologists and molecular programmers
understand the complex reactive pathways of nucleic acid reactions, which can
be designed for many potential applications and can be modelled using a
continuous-time Markov chain (CTMC). Here we present ViDa, a new visualization
approach for DNA reaction trajectories that uses a 2D embedding of the
secondary structure state space underlying the CTMC model. To this end, we
integrate a scattering transform of the secondary structure adjacency, a
variational autoencoder, and a nonlinear dimensionality reduction method. We
augment the training loss with domain-specific supervised terms that capture
both thermodynamic and kinetic features. We assess ViDa on two well-studied DNA
hybridization reactions. Our results demonstrate that the domain-specific
features lead to significant quality improvements over the state-of-the-art in
DNA state space visualization, successfully separating different folding
pathways and thus providing useful insights into dominant reaction mechanisms.
( 2
min )
Try to generate new bridge types using generative artificial intelligence
technology. The grayscale images of the bridge facade with the change of
component width was rendered by 3dsMax animation software, and then the OpenCV
module performed an appropriate amount of geometric transformation (rotation,
horizontal scale, vertical scale) to obtain the image dataset of three-span
beam bridge, arch bridge, cable-stayed bridge and suspension bridge. Based on
Python programming language, TensorFlow and Keras deep learning platform
framework, variational autoencoder was constructed and trained, and
low-dimensional bridge-type latent space that is convenient for vector
operations was obtained. Variational autoencoder can combine two bridge types
on the basis of the original of human into one that is a new bridge type.
Generative artificial intelligence technology can assist bridge designers in
bridge-type innovation, and can be used as copilot.
( 2
min )
We introduce AdaSub, a stochastic optimization algorithm that computes a
search direction based on second-order information in a low-dimensional
subspace that is defined adaptively based on available current and past
information. Compared to first-order methods, second-order methods exhibit
better convergence characteristics, but the need to compute the Hessian matrix
at each iteration results in excessive computational expenses, making them
impractical. To address this issue, our approach enables the management of
computational expenses and algorithm efficiency by enabling the selection of
the subspace dimension for the search. Our code is freely available on GitHub,
and our preliminary numerical results demonstrate that AdaSub surpasses popular
stochastic optimizers in terms of time and number of iterations required to
reach a given accuracy.
( 2
min )
As control engineering methods are applied to increasingly complex systems,
data-driven approaches for system identification appear as a promising
alternative to physics-based modeling. While the Bayesian approaches prevalent
for safety-critical applications usually rely on the availability of state
measurements, the states of a complex system are often not directly measurable.
It may then be necessary to jointly estimate the dynamics and the latent state,
making the quantification of uncertainties and the design of controllers with
formal performance guarantees considerably more challenging. This paper
proposes a novel method for the computation of an optimal input trajectory for
unknown nonlinear systems with latent states based on a combination of particle
Markov chain Monte Carlo methods and scenario theory. Probabilistic performance
guarantees are derived for the resulting input trajectory, and an approach to
validate the performance of arbitrary control laws is presented. The
effectiveness of the proposed method is demonstrated in a numerical simulation.
( 2
min )
The mean shift (MS) algorithm seeks a mode of the kernel density estimate
(KDE). This study presents a convergence guarantee of the mode estimate
sequence generated by the MS algorithm and an evaluation of the convergence
rate, under fairly mild conditions, with the help of the argument concerning
the {\L}ojasiewicz inequality. Our findings extend existing ones covering
analytic kernels and the Epanechnikov kernel. Those are significant in that
they cover the biweight kernel, which is optimal among non-negative kernels in
terms of the asymptotic statistical efficiency for the KDE-based mode
estimation.
( 2
min )
We present an exact Bayesian inference method for discrete statistical
models, which can find exact solutions to a large class of discrete inference
problems, even with infinite support and continuous priors. To express such
models, we introduce a probabilistic programming language that supports
discrete and continuous sampling, discrete observations, affine functions,
(stochastic) branching, and conditioning on discrete events. Our key tool is
probability generating functions: they provide a compact closed-form
representation of distributions that are definable by programs, thus enabling
the exact computation of posterior probabilities, expectation, variance, and
higher moments. Our inference method is provably correct and fully automated in
a tool called Genfer, which uses automatic differentiation (specifically,
Taylor polynomials), but does not require computer algebra. Our experiments
show that Genfer is often faster than the existing exact inference tools PSI,
Dice, and Prodigy. On a range of real-world inference problems that none of
these exact tools can solve, Genfer's performance is competitive with
approximate Monte Carlo methods, while avoiding approximation errors.
( 2
min )
We study the training dynamics of a shallow neural network with quadratic
activation functions and quadratic cost in a teacher-student setup. In line
with previous works on the same neural architecture, the optimization is
performed following the gradient flow on the population risk, where the average
over data points is replaced by the expectation over their distribution,
assumed to be Gaussian.We first derive convergence properties for the gradient
flow and quantify the overparameterization that is necessary to achieve a
strong signal recovery. Then, assuming that the teachers and the students at
initialization form independent orthonormal families, we derive a
high-dimensional limit for the flow and show that the minimal
overparameterization is sufficient for strong recovery. We verify by numerical
experiments that these results hold for more general initializations.
( 2
min )
We study scalable machine learning models for full event reconstruction in
high-energy electron-positron collisions based on a highly granular detector
simulation. Particle-flow reconstruction can be formulated as a supervised
learning task using tracks and calorimeter clusters or hits. We compare a graph
neural network and kernel-based transformer and demonstrate that both avoid
quadratic memory allocation and computational cost while achieving realistic
reconstruction. We show that hyperparameter tuning on a supercomputer
significantly enhances the physics performance of the models, improving the jet
transverse momentum resolution by up to 50% compared to the baseline. The
resulting model is highly portable across hardware processors. Finally, we
demonstrate that the model can be trained on highly granular inputs consisting
of tracks and calorimeter hits, resulting in a competitive physics performance
with the baseline. Datasets and software to reproduce the studies are published
following the findable, accessible, interoperable, and reusable principles.
( 2
min )
The aim of this paper is to make clear and precise the relationship between
the Rubin causal model (RCM) and structural causal model (SCM) frameworks for
causal inference. Adopting a neutral logical perspective, and drawing on
previous work, we show what is required for an RCM to be representable by an
SCM. A key result then shows that every RCM -- including those that violate
algebraic principles implied by the SCM framework -- emerges as an abstraction
of some representable RCM. Finally, we illustrate the power of this
conciliatory perspective by pinpointing an important role for SCM principles in
classic applications of RCMs; conversely, we offer a characterization of the
algebraic constraints implied by a graph, helping to substantiate further
comparisons between the two frameworks.
( 2
min )
There is currently a large gap in performance between the statistically
rigorous methods like linear regression or additive splines and the powerful
deep methods using neural networks. Previous works attempting to close this gap
have failed to fully investigate the exponentially growing number of feature
combinations which deep networks consider automatically during training. In
this work, we develop a tractable selection algorithm to efficiently identify
the necessary feature combinations by leveraging techniques in feature
interaction detection. Our proposed Sparse Interaction Additive Networks (SIAN)
construct a bridge from these simple and interpretable models to fully
connected neural networks. SIAN achieves competitive performance against
state-of-the-art methods across multiple large-scale tabular datasets and
consistently finds an optimal tradeoff between the modeling capacity of neural
networks and the generalizability of simpler methods.
( 2
min )
We study versions of Hilbert's projective metric for spaces of integrable
functions of bounded growth. These metrics originate from cones which are
relaxations of the cone of all non-negative functions, in the sense that they
include all functions having non-negative integral values when multiplied with
certain test functions. We show that kernel integral operators are contractions
with respect to suitable specifications of such metrics even for kernels which
are not bounded away from zero, provided that the decay to zero of the kernel
is controlled. As an application to entropic optimal transport, we show
exponential convergence of Sinkhorn's algorithm in settings where the marginal
distributions have sufficiently light tails compared to the growth of the cost
function.
( 2
min )
In this post, we show you how to create a MAP connector to AWS HealthImaging, which is reusable in applications built with the MONAI Deploy App SDK, to integrate with and accelerate image data retrieval from a cloud-native DICOM store to medical imaging AI workloads. The MONAI Deploy SDK can be used to support hospital operations. We also demonstrate two hosting options to deploy MAP AI applications on SageMaker at scale.
( 10
min )
This post explores how Amazon CodeWhisperer can help with code optimization for sustainability through increased resource efficiency. Computationally resource-efficient coding is one technique that aims to reduce the amount of energy required to process a line of code and, as a result, aid companies in consuming less energy overall. In this era of cloud computing, […]
( 8
min )
NVIDIA’s AI platform raised the bar for AI training and high performance computing in the latest MLPerf industry benchmarks. Among many new records and milestones, one in generative AI stands out: NVIDIA Eos — an AI supercomputer powered by a whopping 10,752 NVIDIA H100 Tensor Core GPUs and NVIDIA Quantum-2 InfiniBand networking — completed a Read article >
( 7
min )
NVIDIA’s AI platform raised the bar for AI training and high performance computing in the latest MLPerf industry benchmarks. Among many new records and milestones, one in generative AI stands out: NVIDIA Eos — an AI supercomputer powered by a whopping 10,752 NVIDIA H100 Tensor Core GPUs and NVIDIA Quantum-2 InfiniBand networking — completed a Read article >
( 7
min )
When patients in Vietnam enter a medical facility in distress, doctors use NVIDIA technology to get more accurate scans to diagnose their ailments. In Hong Kong, a different set of doctors leverage generative AI to discover new cures for patients. Improving the health and well-being of citizens and strengthening economies and communities are key themes Read article >
( 6
min )
When patients in Vietnam enter a medical facility in distress, doctors use NVIDIA technology to get more accurate scans to diagnose their ailments. In Hong Kong, a different set of doctors leverage generative AI to discover new cures for patients. Improving the health and well-being of citizens and strengthening economies and communities are key themes Read article >
( 6
min )
Clinician-led healthcare AI company Harrison.ai has built an AI system that effectively serves as a “spell checker” for radiologists — flagging critical findings to improve the speed and accuracy of radiology image analysis, reducing misdiagnoses. In the latest episode of NVIDIA’s AI Podcast, host Noah Kravitz spoke with Harrison.ai cofounder and CEO Aengus Tran about Read article >
( 6
min )
Clinician-led healthcare AI company Harrison.ai has built an AI system that effectively serves as a “spell checker” for radiologists — flagging critical findings to improve the speed and accuracy of radiology image analysis, reducing misdiagnoses. In the latest episode of NVIDIA’s AI Podcast, host Noah Kravitz spoke with Harrison.ai cofounder and CEO Aengus Tran about Read article >
( 6
min )
Neural Radiance Fields (NeRF) enable 3D scene reconstruction from 2D images
and camera poses for Novel View Synthesis (NVS). Although NeRF can produce
photorealistic results, it often suffers from overfitting to training views,
leading to poor geometry reconstruction, especially in low-texture areas. This
limitation restricts many important applications which require accurate
geometry, such as extrapolated NVS, HD mapping and scene editing. To address
this limitation, we propose a new method to improve NeRF's 3D structure using
only RGB images and semantic maps. Our approach introduces a novel plane
regularization based on Singular Value Decomposition (SVD), that does not rely
on any geometric prior. In addition, we leverage the Structural Similarity
Index Measure (SSIM) in our loss design to properly initialize the volumetric
representation of NeRF. Quantitative and qualitative results show that our
method outperforms popular regularization approaches in accurate geometry
reconstruction for large-scale outdoor scenes and achieves SoTA rendering
quality on the KITTI-360 NVS benchmark.
( 2
min )
A significant challenge facing researchers in the area of multi-agent
reinforcement learning (MARL) pertains to the identification of a library that
can offer fast and compatible development for multi-agent tasks and algorithm
combinations, while obviating the need to consider compatibility issues. In
this paper, we present MARLlib, a library designed to address the
aforementioned challenge by leveraging three key mechanisms: 1) a standardized
multi-agent environment wrapper, 2) an agent-level algorithm implementation,
and 3) a flexible policy mapping strategy. By utilizing these mechanisms,
MARLlib can effectively disentangle the intertwined nature of the multi-agent
task and the learning process of the algorithm, with the ability to
automatically alter the training strategy based on the current task's
attributes. The MARLlib library's source code is publicly accessible on GitHub:
\url{https://github.com/Replicable-MARL/MARLlib}.
( 2
min )
A quantum thermal machine is an open quantum system that enables the
conversion between heat and work at the micro or nano-scale. Optimally
controlling such out-of-equilibrium systems is a crucial yet challenging task
with applications to quantum technologies and devices. We introduce a general
model-free framework based on Reinforcement Learning to identify
out-of-equilibrium thermodynamic cycles that are Pareto optimal trade-offs
between power and efficiency for quantum heat engines and refrigerators. The
method does not require any knowledge of the quantum thermal machine, nor of
the system model, nor of the quantum state. Instead, it only observes the heat
fluxes, so it is both applicable to simulations and experimental devices. We
test our method on a model of an experimentally realistic refrigerator based on
a superconducting qubit, and on a heat engine based on a quantum harmonic
oscillator. In both cases, we identify the Pareto-front representing optimal
power-efficiency tradeoffs, and the corresponding cycles. Such solutions
outperform previous proposals made in the literature, such as optimized Otto
cycles, reducing quantum friction.
( 2
min )
In this paper, we introduce faster first-order primal-dual algorithms for
minimizing a convex function subject to strongly convex function constraints.
Before our work, the best complexity bound was $\mathcal{O}(1/{\varepsilon})$,
and it remains unclear how to improve this result by leveraging the strong
convexity assumption. We address this issue by developing novel techniques to
progressively estimate the strong convexity of the Lagrangian function. Our
approach yields an improved complexity of $\mathcal{O}(1/\sqrt{\varepsilon})$,
matching the complexity lower bound for strongly-convex-concave saddle point
optimization. We show the superior performance of our methods in
sparsity-inducing constrained optimization, notably Google's personalized
PageRank problem. Furthermore, we show that a restarted version of the proposed
methods can effectively identify the sparsity pattern of the optimal solution
within a finite number of steps, a result that appears to have independent
significance.
( 2
min )
Imitation learning of robot policies from few demonstrations is crucial in
open-ended applications. We propose a new method, Interaction Warping, for
learning SE(3) robotic manipulation policies from a single demonstration. We
infer the 3D mesh of each object in the environment using shape warping, a
technique for aligning point clouds across object instances. Then, we represent
manipulation actions as keypoints on objects, which can be warped with the
shape of the object. We show successful one-shot imitation learning on three
simulated and real-world object re-arrangement tasks. We also demonstrate the
ability of our method to predict object meshes and robot grasps in the wild.
( 2
min )
Interatomic potentials learned using machine learning methods have been
successfully applied to atomistic simulations. However, accurate models require
large training datasets, while generating reference calculations is
computationally demanding. To bypass this difficulty, we propose a transfer
learning algorithm that leverages the ability of graph neural networks (GNNs)
to represent chemical environments together with kernel mean embeddings. We
extract a feature map from GNNs pre-trained on the OC20 dataset and use it to
learn the potential energy surface from system-specific datasets of catalytic
processes. Our method is further enhanced by incorporating into the kernel the
chemical species information, resulting in improved performance and
interpretability. We test our approach on a series of realistic datasets of
increasing complexity, showing excellent generalization and transferability
performance, and improving on methods that rely on GNNs or ridge regression
alone, as well as similar fine-tuning approaches.
( 2
min )
Weakly supervised semantic segmentation (WSSS) aims to bypass the need for
laborious pixel-level annotation by using only image-level annotation. Most
existing methods rely on Class Activation Maps (CAM) to derive pixel-level
pseudo-labels and use them to train a fully supervised semantic segmentation
model. Although these pseudo-labels are class-aware, indicating the coarse
regions for particular classes, they are not object-aware and fail to delineate
accurate object boundaries. To address this, we introduce a simple yet
effective method harnessing the Segment Anything Model (SAM), a class-agnostic
foundation model capable of producing fine-grained instance masks of objects,
parts, and subparts. We use CAM pseudo-labels as cues to select and combine SAM
masks, resulting in high-quality pseudo-labels that are both class-aware and
object-aware. Our approach is highly versatile and can be easily integrated
into existing WSSS methods without any modification. Despite its simplicity,
our approach shows consistent gain over the state-of-the-art WSSS methods on
both PASCAL VOC and MS-COCO datasets.
( 2
min )
Convolutional neural networks necessitate good algorithms to reduce
complexity, and sufficient utilization of parallel processors for acceleration.
Within convolutional layers, there are three types of operators: convolution
used in forward propagation, deconvolution and dilated-convolution utilized in
backward propagation. During the execution of these operators, zeros are
typically added to tensors, leading to redundant calculations and unnecessary
strain on hardware. To circumvent these inefficiencies, we propose the C-K-S
algorithm, accompanied by efficient GPU implementations. C-K-S trims filters to
exclude zero-padding. For deconvolution and dilated-convolution, C-K-S
transforms sparse tensors into dense tensors, and standardizes the local
computational rules to simplify the hardware control. The experimental results
demonstrate that C-K-S offers good performance in terms of speed and
convergence, surpassing the capabilities of PyTorch and cuDNN in certain
scenarios.
( 2
min )
This work introduces the first small-loss and gradual-variation regret bounds
for online portfolio selection, marking the first instances of data-dependent
bounds for online convex optimization with non-Lipschitz, non-smooth losses.
The algorithms we propose exhibit sublinear regret rates in the worst cases and
achieve logarithmic regrets when the data is "easy," with per-iteration time
almost linear in the number of investment alternatives. The regret bounds are
derived using novel smoothness characterizations of the logarithmic loss, a
local norm-based analysis of following the regularized leader (FTRL) with
self-concordant regularizers, which are not necessarily barriers, and an
implicit variant of optimistic FTRL with the log-barrier.
( 2
min )
We demonstrate a validity problem of machine learning in the vital
application area of disease diagnosis in medicine. It arises when target labels
in training data are determined by an indirect measurement, and the fundamental
measurements needed to determine this indirect measurement are included in the
input data representation. Machine learning models trained on this data will
learn nothing else but to exactly reconstruct the known target definition. Such
models show perfect performance on similarly constructed test data but will
fail catastrophically on real-world examples where the defining fundamental
measurements are not or only incompletely available. We present a general
procedure allowing identification of problematic datasets and black-box machine
learning models trained on them, and exemplify our detection procedure on the
task of early prediction of sepsis.
( 2
min )
Estimating a prediction function is a fundamental component of many data
analyses. The Super Learner ensemble, a particular implementation of stacking,
has desirable theoretical properties and has been used successfully in many
applications. Dimension reduction can be accomplished by using variable
screening algorithms, including the lasso, within the ensemble prior to fitting
other prediction algorithms. However, the performance of a Super Learner using
the lasso for dimension reduction has not been fully explored in cases where
the lasso is known to perform poorly. We provide empirical results that suggest
that a diverse set of candidate screening algorithms should be used to protect
against poor performance of any one screen, similar to the guidance for
choosing a library of prediction algorithms for the Super Learner.
( 2
min )
Kernel density estimation (KDE) is integral to a range of generative and
discriminative tasks in machine learning. Drawing upon tools from the
multidimensional calculus of variations, we derive an optimal weight function
that reduces bias in standard kernel density estimates for density ratios,
leading to improved estimates of prediction posteriors and
information-theoretic measures. In the process, we shed light on some
fundamental aspects of density estimation, particularly from the perspective of
algorithms that employ KDEs as their main building blocks.
( 2
min )
We propose the Kuramoto Graph Neural Network (KuramotoGNN), a novel class of
continuous-depth graph neural networks (GNNs) that employs the Kuramoto model
to mitigate the over-smoothing phenomenon, in which node features in GNNs
become indistinguishable as the number of layers increases. The Kuramoto model
captures the synchronization behavior of non-linear coupled oscillators. Under
the view of coupled oscillators, we first show the connection between Kuramoto
model and basic GNN and then over-smoothing phenomenon in GNNs can be
interpreted as phase synchronization in Kuramoto model. The KuramotoGNN
replaces this phase synchronization with frequency synchronization to prevent
the node features from converging into each other while allowing the system to
reach a stable synchronized state. We experimentally verify the advantages of
the KuramotoGNN over the baseline GNNs and existing methods in reducing
over-smoothing on various graph deep learning benchmark tasks.
( 2
min )
In biomedical applications it is often necessary to estimate a physiological
response to a treatment consisting of multiple components, and learn the
separate effects of the components in addition to the joint effect. Here, we
extend existing probabilistic nonparametric approaches to explicitly address
this problem. We also develop a new convolution-based model for composite
treatment-response curves that is more biologically interpretable. We validate
our models by estimating the impact of carbohydrate and fat in meals on blood
glucose. By differentiating treatment components, incorporating their dosages,
and sharing statistical information across patients via a hierarchical
multi-output Gaussian process, our method improves prediction accuracy over
existing approaches, and allows us to interpret the different effects of
carbohydrates and fat on the overall glucose response.
( 2
min )
We show that the likelihood function for a multinomial vector observed under
arbitrary interval censoring constraints on the frequencies or their partial
sums is completely log-concave by proving that the constrained sample spaces
comprise M-convex subsets of the discrete simplex.
( 2
min )
This paper studies Anderson acceleration (AA) for fixed-point methods
${x}^{(k+1)}=q({x}^{(k)})$. It provides the first proof that when the operator
$q$ is linear and symmetric, AA improves the root-linear convergence factor
over the fixed-point iterations. When $q$ is nonlinear, yet has a symmetric
Jacobian at the solution, a slightly modified AA algorithm is proved to have an
analogous root-linear convergence factor improvement over fixed-point
iterations. Simulations verify our observations. Furthermore, experiments with
different data models demonstrate AA is significantly superior to the standard
fixed-point methods for Tyler's M-estimation.
( 2
min )
We are seeing a flurry of regulation But we should ask ourselves if we are seeing regulatory capture — ie letting corporations write lax rules that lead to public harm. Andrew Ng points out some contradictions: “It’s also a mistake to set reporting requirements based on a computation threshold for model training. This will stifle… Read More »Regulatory Capture: Why AI regulation favours the incumbents
The post Regulatory Capture: Why AI regulation favours the incumbents appeared first on Data Science Central.
( 20
min )
Large language models (LLMs) with their broad knowledge, can generate human-like text on almost any topic. However, their training on massive datasets also limits their usefulness for specialized tasks. Without continued learning, these models remain oblivious to new data and trends that emerge after their initial training. Furthermore, the cost to train new LLMs can […]
( 14
min )
This research paper was presented at the 64th IEEE Symposium on Foundations of Computer Science (FOCS) 2023 (opens in new tab), a premier forum for the latest research in theoretical computer science. Submodular functions are versatile mathematical tools, finding diverse applications in real-world scenarios and guiding solutions across complex domains. From dissecting the intricate networks […]
The post Toward developing faster algorithms for minimizing submodular functions appeared first on Microsoft Research.
( 10
min )
Taiwanese artist Steven Tung creates captivating 2D and 3D digital art that explores sci-fi, minimalism and realism and pushes artistic boundaries.
( 6
min )
Taiwanese artist Steven Tung creates captivating 2D and 3D digital art that explores sci-fi, minimalism and realism and pushes artistic boundaries.
( 6
min )
The expressivity of Graph Neural Networks (GNNs) can be entirely
characterized by appropriate fragments of the first-order logic. Namely, any
query of the two variable fragment of graded modal logic (GC2) interpreted over
labeled graphs can be expressed using a GNN whose size depends only on the
depth of the query. As pointed out by [Barcelo & Al., 2020, Grohe, 2021], this
description holds for a family of activation functions, leaving the possibility
for a hierarchy of logics expressible by GNNs depending on the chosen
activation function. In this article, we show that such hierarchy indeed exists
by proving that GC2 queries cannot be expressed by GNNs with polynomial
activation functions. This implies a separation between polynomial and popular
non-polynomial activations (such as ReLUs, sigmoid and hyperbolic tan and
others) and answers an open question formulated by [Grohe, 2021].
( 2
min )
Quantifying the difference between two probability density functions, $p$ and
$q$, using available data, is a fundamental problem in Statistics and Machine
Learning. A usual approach for addressing this problem is the likelihood-ratio
estimation (LRE) between $p$ and $q$, which -- to our best knowledge -- has
been investigated mainly for the offline case. This paper contributes by
introducing a new framework for online non-parametric LRE (OLRE) for the
setting where pairs of iid observations $(x_t \sim p, x'_t \sim q)$ are
observed over time. The non-parametric nature of our approach has the advantage
of being agnostic to the forms of $p$ and $q$. Moreover, we capitalize on the
recent advances in Kernel Methods and functional minimization to develop an
estimator that can be efficiently updated online. We provide theoretical
guarantees for the performance of the OLRE method along with empirical
validation in synthetic experiments.
( 2
min )
An emerging new paradigm for solving inverse problems is via the use of deep
learning to learn a regularizer from data. This leads to high-quality results,
but often at the cost of provable guarantees. In this work, we show how
well-posedness and convergent regularization arises within the convex-nonconvex
(CNC) framework for inverse problems. We introduce a novel input weakly convex
neural network (IWCNN) construction to adapt the method of learned adversarial
regularization to the CNC framework. Empirically we show that our method
overcomes numerical issues of previous adversarial methods.
( 2
min )
Optical computing systems can provide high-speed and low-energy data
processing but face deficiencies in computationally demanding training and
simulation-to-reality gap. We propose a model-free solution for lightweight in
situ optimization of optical computing systems based on the score gradient
estimation algorithm. This approach treats the system as a black box and
back-propagates loss directly to the optical weights' probabilistic
distributions, hence circumventing the need for computation-heavy and biased
system simulation. We demonstrate a superior classification accuracy on the
MNIST and FMNIST datasets through experiments on a single-layer diffractive
optical computing system. Furthermore, we show its potential for image-free and
high-speed cell analysis. The inherent simplicity of our proposed method,
combined with its low demand for computational resources, expedites the
transition of optical computing from laboratory demonstrations to real-world
applications.
( 2
min )
Restricting the variance of a policy's return is a popular choice in
risk-averse Reinforcement Learning (RL) due to its clear mathematical
definition and easy interpretability. Traditional methods directly restrict the
total return variance. Recent methods restrict the per-step reward variance as
a proxy. We thoroughly examine the limitations of these variance-based methods,
such as sensitivity to numerical scale and hindering of policy learning, and
propose to use an alternative risk measure, Gini deviation, as a substitute. We
study various properties of this new risk measure and derive a policy gradient
algorithm to minimize it. Empirical evaluation in domains where risk-aversion
can be clearly defined, shows that our algorithm can mitigate the limitations
of variance-based risk measures and achieves high return with low risk in terms
of variance and Gini deviation when others fail to learn a reasonable policy.
( 2
min )
We show how to "compile" human-readable programs into standard decoder-only
transformer models. Our compiler, Tracr, generates models with known structure.
This structure can be used to design experiments. For example, we use it to
study "superposition" in transformers that execute multi-step algorithms.
Additionally, the known structure of Tracr-compiled models can serve as
ground-truth for evaluating interpretability methods. Commonly, because the
"programs" learned by transformers are unknown it is unclear whether an
interpretation succeeded. We demonstrate our approach by implementing and
examining programs including computing token frequencies, sorting, and
parenthesis checking. We provide an open-source implementation of Tracr at
https://github.com/google-deepmind/tracr.
( 2
min )
Quantifying the difference between two probability density functions, $p$ and
$q$, using available data, is a fundamental problem in Statistics and Machine
Learning. A usual approach for addressing this problem is the likelihood-ratio
estimation (LRE) between $p$ and $q$, which -- to our best knowledge -- has
been investigated mainly for the offline case. This paper contributes by
introducing a new framework for online non-parametric LRE (OLRE) for the
setting where pairs of iid observations $(x_t \sim p, x'_t \sim q)$ are
observed over time. The non-parametric nature of our approach has the advantage
of being agnostic to the forms of $p$ and $q$. Moreover, we capitalize on the
recent advances in Kernel Methods and functional minimization to develop an
estimator that can be efficiently updated online. We provide theoretical
guarantees for the performance of the OLRE method along with empirical
validation in synthetic experiments.
( 2
min )
In gradient descent dynamics of neural networks, the top eigenvalue of the
Hessian of the loss (sharpness) displays a variety of robust phenomena
throughout training. This includes early time regimes where the sharpness may
decrease during early periods of training (sharpness reduction), and later time
behavior such as progressive sharpening and edge of stability. We demonstrate
that a simple $2$-layer linear network (UV model) trained on a single training
example exhibits all of the essential sharpness phenomenology observed in
real-world scenarios. By analyzing the structure of dynamical fixed points in
function space and the vector field of function updates, we uncover the
underlying mechanisms behind these sharpness trends. Our analysis reveals (i)
the mechanism behind early sharpness reduction and progressive sharpening, (ii)
the required conditions for edge of stability, and (iii) a period-doubling
route to chaos on the edge of stability manifold as learning rate is increased.
Finally, we demonstrate that various predictions from this simplified model
generalize to real-world scenarios and discuss its limitations.
( 2
min )
A stochastic process that arises by composing a function with a Markov
process is called an aggregated Markov process (AMP). The purpose of composing
a Markov process with a function can be a reduction of dimensions, e.g., a
projection onto certain coordinates. The theory around AMP has been extensively
studied e.g. by Dynkin, Cameron, Rogers and Pitman, and Kelly, all of whom
provided sufficient conditions for an AMP to remain Markov. In another
direction, Larget provided a canonical representation for AMP, which can be
used to verify the equivalence of two AMPs. The purpose of this paper is to
describe how the theory of AMP can be applied to stochastic learning theory as
they learn a particular task.
( 2
min )
Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from scanned documents. Queries is a feature that enables you to extract specific pieces of information from varying, complex documents using natural language. Custom Queries provides a way for you to customize the Queries feature for your business-specific, non-standard documents […]
( 9
min )
We are excited to announce that Amazon SageMaker JumpStart can now stream large language model (LLM) inference responses. Token streaming allows you to see the model response output as it is being generated instead of waiting for LLMs to finish the response generation before it is made available for you to use or display. The […]
( 7
min )
GPT-4 Turbo with 128K context and lower prices, the new Assistants API, GPT-4 Turbo with Vision, DALL·E 3 API, and more.
( 7
min )
High-fidelity simulators that connect theoretical models with observations
are indispensable tools in many sciences. When coupled with machine learning, a
simulator makes it possible to infer the parameters of a theoretical model
directly from real and simulated observations without explicit use of the
likelihood function. This is of particular interest when the latter is
intractable. In this work, we introduce a simple extension of the recently
proposed likelihood-free frequentist inference (LF2I) approach that has some
computational advantages. Like LF2I, this extension yields provably valid
confidence sets in parameter inference problems in which a high-fidelity
simulator is available. The utility of our algorithm is illustrated by applying
it to three pedagogically interesting examples: the first is from cosmology,
the second from high-energy physics and astronomy, both with tractable
likelihoods, while the third, with an intractable likelihood, is from
epidemiology.
( 2
min )
To quantify uncertainty, conformal prediction methods are gaining
continuously more interest and have already been successfully applied to
various domains. However, they are difficult to apply to time series as the
autocorrelative structure of time series violates basic assumptions required by
conformal prediction. We propose HopCPT, a novel conformal prediction approach
for time series that not only copes with temporal structures but leverages
them. We show that our approach is theoretically well justified for time series
where temporal dependencies are present. In experiments, we demonstrate that
our new approach outperforms state-of-the-art conformal prediction methods on
multiple real-world time series datasets from four different domains.
( 2
min )
Manifolds discovered by machine learning models provide a compact
representation of the underlying data. Geodesics on these manifolds define
locally length-minimising curves and provide a notion of distance, which are
key for reduced-order modelling, statistical inference, and interpolation. In
this work, we propose a model-based parameterisation for distance fields and
geodesic flows on manifolds, exploiting solutions of a manifold-augmented
Eikonal equation. We demonstrate how the geometry of the manifold impacts the
distance field, and exploit the geodesic flow to obtain globally
length-minimising curves directly. This work opens opportunities for statistics
and reduced-order modelling on differentiable manifolds.
( 2
min )
In recent years, federated minimax optimization has attracted growing
interest due to its extensive applications in various machine learning tasks.
While Smoothed Alternative Gradient Descent Ascent (Smoothed-AGDA) has proved
its success in centralized nonconvex minimax optimization, how and whether
smoothing technique could be helpful in federated setting remains unexplored.
In this paper, we propose a new algorithm termed Federated Stochastic Smoothed
Gradient Descent Ascent (FESS-GDA), which utilizes the smoothing technique for
federated minimax optimization. We prove that FESS-GDA can be uniformly used to
solve several classes of federated minimax problems and prove new or better
analytical convergence results for these settings. We showcase the practical
efficiency of FESS-GDA in practical federated learning tasks of training
generative adversarial networks (GANs) and fair classification.
( 2
min )
We introduce Resilient Multiple Choice Learning (rMCL), an extension of the
MCL approach for conditional distribution estimation in regression settings
where multiple targets may be sampled for each training input. Multiple Choice
Learning is a simple framework to tackle multimodal density estimation, using
the Winner-Takes-All (WTA) loss for a set of hypotheses. In regression
settings, the existing MCL variants focus on merging the hypotheses, thereby
eventually sacrificing the diversity of the predictions. In contrast, our
method relies on a novel learned scoring scheme underpinned by a mathematical
framework based on Voronoi tessellations of the output space, from which we can
derive a probabilistic interpretation. After empirically validating rMCL with
experiments on synthetic data, we further assess its merits on the sound source
localization problem, demonstrating its practical usefulness and the relevance
of its interpretation.
( 2
min )
In this paper, we focus on the data-driven discovery of a general
second-order particle-based model that contains many state-of-the-art models
for modeling the aggregation and collective behavior of interacting agents of
similar size and body type. This model takes the form of a high-dimensional
system of ordinary differential equations parameterized by two interaction
kernels that appraise the alignment of positions and velocities. We propose a
Gaussian Process-based approach to this problem, where the unknown model
parameters are marginalized by using two independent Gaussian Process (GP)
priors on latent interaction kernels constrained to dynamics and observational
data. This results in a nonparametric model for interacting dynamical systems
that accounts for uncertainty quantification. We also develop acceleration
techniques to improve scalability. Moreover, we perform a theoretical analysis
to interpret the methodology and investigate the conditions under which the
kernels can be recovered. We demonstrate the effectiveness of the proposed
approach on various prototype systems, including the selection of the order of
the systems and the types of interactions. In particular, we present
applications to modeling two real-world fish motion datasets that display
flocking and milling patterns up to 248 dimensions. Despite the use of small
data sets, the GP-based approach learns an effective representation of the
nonlinear dynamics in these spaces and outperforms competitor methods.
( 3
min )
Embodying the convergence of AI and academia, the University of Florida Friday inaugurated the Malachowsky Hall for Data Science & Information Technology. The sleek, seven-story building is poised to play a pivotal role in UF’s ongoing efforts to harness the transformative power of AI, reaffirming its stature as one of the nation’s leading public universities. Read article >
( 6
min )
Embodying the convergence of AI and academia, the University of Florida Friday inaugurated the Malachowsky Hall for Data Science & Information Technology. The sleek, seven-story building is poised to play a pivotal role in UF’s ongoing efforts to harness the transformative power of AI, reaffirming its stature as one of the nation’s leading public universities. Read article >
( 6
min )
The world’s 5 billion internet users and nearly 54 billion devices generate 3.4 petabytes of data per second, according to IDC. As digitalization accelerates, enterprise IT teams are under greater pressure to identify and block incoming cyber threats to ensure business operations and services are not interrupted — and AI-based cybersecurity provides a reliable way Read article >
( 11
min )
The world’s 5 billion internet users and nearly 54 billion devices generate 3.4 petabytes of data per second, according to IDC. As digitalization accelerates, enterprise IT teams are under greater pressure to identify and block incoming cyber threats to ensure business operations and services are not interrupted — and AI-based cybersecurity provides a reliable way Read article >
( 11
min )
There’s a kind of magic that surrounds a soccer shot so powerful, it leaves spectators, players, and even commentators in a momentary state of awe. Think back to a moment when the sheer force of a strike left an entire Bundesliga stadium buzzing with energy. What exactly captures our imagination with such intensity? While there […]
( 10
min )
Thirteen new graduate student fellows will pursue exciting new paths of knowledge and discovery.
( 14
min )
Rama Ramakrishnan helps companies explore the promises and perils of large language models and other transformative AI technologies.
( 10
min )
Amazon SageMaker Canvas now supports deploying machine learning (ML) models to real-time inferencing endpoints, allowing you take your ML models to production and drive action based on ML-powered insights. SageMaker Canvas is a no-code workspace that enables analysts and citizen data scientists to generate accurate ML predictions for their business needs. Until now, SageMaker Canvas […]
( 6
min )
Recently, teachers and institutions have looked for different ways to incorporate artificial intelligence (AI) into their curriculums, whether it be teaching about machine learning (ML) or incorporating it into creating lesson plans, grading, or other educational applications. Generative AI models, in particular large language models (LLMs), have dramatically sped up AI’s impact on education. Generative […]
( 8
min )
AI technologies are having a massive impact across industries, including media and entertainment, automotive, customer service and more.
( 8
min )
AI technologies are having a massive impact across industries, including media and entertainment, automotive, customer service and more.
( 8
min )
Gear up with gratitude for more gaming time. GeForce NOW brings members a cornucopia of 15 newly supported games to the cloud this week. That’s just the start — there are a total of 54 titles coming in the month of November. Members can also join thousands of esports fans in the cloud with the Read article >
( 8
min )
Gear up with gratitude for more gaming time. GeForce NOW brings members a cornucopia of 15 newly supported games to the cloud this week. That’s just the start — there are a total of 54 titles coming in the month of November. Members can also join thousands of esports fans in the cloud with the Read article >
( 8
min )
Visual language processing (VLP) is at the forefront of generative AI, driving advancements in multimodal learning that encompasses language intelligence, vision understanding, and processing. Combined with large language models (LLM) and Contrastive Language-Image Pre-Training (CLIP) trained with a large quantity of multimodality data, visual language models (VLMs) are particularly adept at tasks like image captioning, […]
( 16
min )
Today, personally identifiable information (PII) is everywhere. PII is in emails, slack messages, videos, PDFs, and so on. It refers to any data or information that can be used to identify a specific individual. PII is sensitive in nature and includes various types of personal data, such as name, contact information, identification numbers, financial information, […]
( 8
min )
The home of the first industrial revolution just made a massive investment in the next one. The U.K. government has announced it will spend £225 million ($273 million) to build one of the world’s fastest AI supercomputers. Called Isambard-AI, it’s the latest in a series of systems named after a legendary 19th century British engineer Read article >
( 6
min )
The home of the first industrial revolution just made a massive investment in the next one. The U.K. government has announced it will spend £225 million ($273 million) to build one of the world’s fastest AI supercomputers. Called Isambard-AI, it’s the latest in a series of systems named after a legendary 19th century British engineer Read article >
( 6
min )
Generative AI and large language models are stirring change across industries — but according to NVIDIA Senior Product Manager of Developer Marketing Annamalai Chockalingam, “we’re still in the early innings.” In the latest episode of NVIDIA’s AI Podcast, host Noah Kravitz spoke with Chockalingam about LLMs: what they are, their current state and their future Read article >
( 5
min )
Generative AI and large language models are stirring change across industries — but according to NVIDIA Senior Product Manager of Developer Marketing Annamalai Chockalingam, “we’re still in the early innings.” In the latest episode of NVIDIA’s AI Podcast, host Noah Kravitz spoke with Chockalingam about LLMs: what they are, their current state and their future Read article >
( 5
min )
Virtual fitting room software with AR and AI is the next best alternative to physical stores. With many different kinds of virtual fitting room solutions on offer though, it can be hard to know which ones are the most feasible for your business. Let’s talk about the various approaches to developing such solutions. Types of… Read More »Approaches to creating virtual fitting room software using AR and AI
The post Approaches to creating virtual fitting room software using AR and AI appeared first on Data Science Central.
( 21
min )
The modern digital ecosystem, buzzing with the chatter of data and algorithms, presents both promises and challenges. In this intricate web, generative artificial intelligence (GenAI) shines as a beacon of innovation. To harness this power, enterprises need more than just cutting-edge technology. They need a bridge between ambition and realization—a role aptly filled by… Read More »How technical program managers can build a robust Generative AI future
The post How technical program managers can build a robust Generative AI future appeared first on Data Science Central.
( 21
min )
Generative AI is revolutionizing our creative landscape, unlocking unprecedented possibilities. But at what cost? Dive into the ethical dilemmas of this transformative technology, exploring the fine line between innovation and ethical consideration. 2022 was a huge year for Generative AI. The release of DALL-E 2 in April showed the public the possibilities of text-to-image Gen… Read More »Generative AI ethics: Navigating the boundary between human and machine creativity
The post Generative AI ethics: Navigating the boundary between human and machine creativity appeared first on Data Science Central.
( 23
min )
In the world’s largest solar race car event of the year, the University of New South Wales Sunswift Racing team is having its day in the sun. The World Solar Challenge, which first began some 35 years ago, attracts academic participants from across the globe. This year’s event drew nearly 100 competitors. The race runs Read article >
( 6
min )
In the world’s largest solar race car event of the year, the University of New South Wales Sunswift Racing team is having its day in the sun. The World Solar Challenge, which first began some 35 years ago, attracts academic participants from across the globe. This year’s event drew nearly 100 competitors. The race runs Read article >
( 6
min )
The highly anticipated NVIDIA DLSS 3.5 update, including Ray Reconstruction for NVIDIA Omniverse — a platform for connecting and building custom 3D tools and apps — is now available.
( 7
min )
The highly anticipated NVIDIA DLSS 3.5 update, including Ray Reconstruction for NVIDIA Omniverse — a platform for connecting and building custom 3D tools and apps — is now available.
( 7
min )
This post was co-written with Anthony Medeiros, Manager of Solutions Engineering and Architecture for North America Artificial Intelligence, and Blake Santschi, Business Intelligence Manager, from Schneider Electric. Additional Schneider Electric experts include Jesse Miller, Somik Chowdhury, Shaswat Babhulgaonkar, David Watkins, Mark Carlson and Barbara Sleczkowski. Enterprise Resource Planning (ERP) systems are used by companies to […]
( 10
min )
AI Weirdness: the strange side of machine learning
( 2
min )
Amazon Bedrock is a fully managed service provided by AWS that offers developers access to foundation models (FMs) and the tools to customize them for specific applications. It allows developers to build and scale generative AI applications using FMs through an API, without managing infrastructure. You can choose from various FMs from Amazon and leading […]
( 8
min )
We are excited to announce a simplified version of the Amazon SageMaker JumpStart SDK that makes it straightforward to build, train, and deploy foundation models. The code for prediction is also simplified. In this post, we demonstrate how you can use the simplified SageMaker Jumpstart SDK to get started with using foundation models in just a couple of lines of code.
( 7
min )
Two roads diverged in a wood, and I;I took the one less traveled by,And that has made all the difference. — Robert Frost At certain points in the evolution of enterprise artificial intelligence, there’s been a fork in the road. The road less traveled has suggested a different route to a more satisfying kind of… Read More »FAIR knowledge: The key precondition for trusted generative AI
The post FAIR knowledge: The key precondition for trusted generative AI appeared first on Data Science Central.
( 21
min )
A research paper released today describes ways generative AI can assist one of the most complex engineering efforts: designing semiconductors. The work demonstrates how companies in highly specialized fields can train large language models (LLMs) on their internal data to build assistants that increase productivity. Few pursuits are as challenging as semiconductor design. Under a Read article >
( 6
min )
Teachers are the backbone of any educational system. They are not just educators; they are indispensable navigators, mentors, and leaders. Teachers around the world face many challenges, which vary from country to country or even within a city or town. But some challenges are universal, including time management, classroom organization, and creating effective lesson plans. […]
The post Teachers in India help Microsoft Research design AI tool for creating great classroom content appeared first on Microsoft Research.
( 12
min )
Complimentary approaches — “HighLight” and “Tailors and Swiftiles” — could boost the performance of demanding machine-learning tasks.
( 11
min )
The SecureLoop search tool efficiently identifies secure designs for hardware that can boost the performance of complex AI tasks, while requiring less energy.
( 10
min )
Two studies find “self-supervised” models, which learn about their environment from unlabeled data, can show activity patterns similar to those of the mammalian brain.
( 11
min )
Systematic reviews are vital for guiding practice, research, and policy, yet
they are often slow and labour-intensive. Large language models (LLMs) could
offer a way to speed up and automate systematic reviews, but their performance
in such tasks has not been comprehensively evaluated against humans, and no
study has tested GPT-4, the biggest LLM so far. This pre-registered study
evaluates GPT-4's capability in title/abstract screening, full-text review, and
data extraction across various literature types and languages using a
'human-out-of-the-loop' approach. Although GPT-4 had accuracy on par with human
performance in most tasks, results were skewed by chance agreement and dataset
imbalance. After adjusting for these, there was a moderate level of performance
for data extraction, and - barring studies that used highly reliable prompts -
screening performance levelled at none to moderate for different stages and
languages. When screening full-text literature using highly reliable prompts,
GPT-4's performance was 'almost perfect.' Penalising GPT-4 for missing key
studies using highly reliable prompts improved its performance even more. Our
findings indicate that, currently, substantial caution should be used if LLMs
are being used to conduct systematic reviews, but suggest that, for certain
systematic review tasks delivered under reliable prompts, LLMs can rival human
performance.
( 3
min )
We study the problem of designing adaptive multi-armed bandit algorithms that
perform optimally in both the stochastic setting and the adversarial setting
simultaneously (often known as a best-of-both-world guarantee). A line of
recent works shows that when configured and analyzed properly, the
Follow-the-Regularized-Leader (FTRL) algorithm, originally designed for the
adversarial setting, can in fact optimally adapt to the stochastic setting as
well. Such results, however, critically rely on an assumption that there exists
one unique optimal arm. Recently, Ito (2021) took the first step to remove such
an undesirable uniqueness assumption for one particular FTRL algorithm with the
$\frac{1}{2}$-Tsallis entropy regularizer. In this work, we significantly
improve and generalize this result, showing that uniqueness is unnecessary for
FTRL with a broad family of regularizers and a new learning rate schedule. For
some regularizers, our regret bounds also improve upon prior results even when
uniqueness holds. We further provide an application of our results to the
decoupled exploration and exploitation problem, demonstrating that our
techniques are broadly applicable.
( 3
min )
Graph neural networks (GNNs) have become compelling models designed to
perform learning and inference on graph-structured data. However, little work
has been done to understand the fundamental limitations of GNNs for scaling to
larger graphs and generalizing to out-of-distribution (OOD) inputs. In this
paper, we use a random graph generator to systematically investigate how the
graph size and structural properties affect the predictive performance of GNNs.
We present specific evidence that the average node degree is a key feature in
determining whether GNNs can generalize to unseen graphs, and that the use of
multiple node update functions can improve the generalization performance of
GNNs when dealing with graphs of multimodal degree distributions. Accordingly,
we propose a multi-module GNN framework that allows the network to adapt
flexibly to new graphs by generalizing a single canonical nonlinear
transformation over aggregated inputs. Our results show that the multi-module
GNNs improve the OOD generalization on a variety of inference tasks in the
direction of diverse structural features.
( 2
min )
Stochastic gradient descent (SGD) algorithm is the method of choice in many
machine learning tasks thanks to its scalability and efficiency in dealing with
large-scale problems. In this paper, we focus on the shuffling version of SGD
which matches the mainstream practical heuristics. We show the convergence to a
global solution of shuffling SGD for a class of non-convex functions under
over-parameterized settings. Our analysis employs more relaxed non-convex
assumptions than previous literature. Nevertheless, we maintain the desired
computational complexity as shuffling SGD has achieved in the general convex
setting.
( 2
min )
We study the bias of Stochastic Gradient Descent (SGD) to learn low-rank
weight matrices when training deep neural networks. Our results show that
training neural networks with mini-batch SGD and weight decay causes a bias
towards rank minimization over the weight matrices. Specifically, we show, both
theoretically and empirically, that this bias is more pronounced when using
smaller batch sizes, higher learning rates, or increased weight decay.
Additionally, we predict and observe empirically that weight decay is necessary
to achieve this bias. Unlike previous literature, our analysis does not rely on
assumptions about the data, convergence, or optimality of the weight matrices
and applies to a wide range of neural network architectures of any width or
depth. Finally, we empirically investigate the connection between this bias and
generalization, finding that it has a marginal effect on generalization.
( 2
min )
This research underscores the efficacy of Fourier topological optimization in
refining MRI imagery, thereby bolstering the classification precision of
Alzheimer's Disease through convolutional neural networks. Recognizing that MRI
scans are indispensable for neurological assessments, but frequently grapple
with issues like blurriness and contrast irregularities, the deployment of
Fourier topological optimization offered enhanced delineation of brain
structures, ameliorated noise, and superior contrast. The applied techniques
prioritized boundary enhancement, contrast and brightness adjustments, and
overall image lucidity. Employing CNN architectures VGG16, ResNet50,
InceptionV3, and Xception, the post-optimization analysis revealed a marked
elevation in performance. Conclusively, the amalgamation of Fourier topological
optimization with CNNs delineates a promising trajectory for the nuanced
classification of Alzheimer's Disease, portending a transformative impact on
its diagnostic paradigms.
( 2
min )
As large language models (LLMs) are widely adopted, new safety issues and
policies emerge, to which existing safety classifiers do not generalize well.
If we have only observed a few examples of violations of a new safety rule, how
can we build a classifier to detect violations? In this paper, we study the
novel setting of domain-generalized few-shot learning for LLM-based text safety
classifiers. Unlike prior few-shot work, these new safety issues can be hard to
uncover and we do not get to choose the few examples. We demonstrate that
existing few-shot techniques do not perform well in this setting, and rather we
propose to do parameter-efficient fine-tuning (PEFT) combined with augmenting
training data based on similar examples in prior existing rules. We empirically
show that our approach of similarity-based data-augmentation + prompt-tuning
(DAPT) consistently outperforms baselines that either do not rely on data
augmentation or on PEFT by 7-17% F1 score in the Social Chemistry moral
judgement and 9-13% AUC in the Toxicity detection tasks, even when the new rule
is loosely correlated with existing ones.
( 2
min )
A fundamental problem of causal discovery is cause-effect inference, learning
the correct causal direction between two random variables. Significant progress
has been made through modelling the effect as a function of its cause and a
noise term, which allows us to leverage assumptions about the generating
function class. The recently introduced heteroscedastic location-scale noise
functional models (LSNMs) combine expressive power with identifiability
guarantees. LSNM model selection based on maximizing likelihood achieves
state-of-the-art accuracy, when the noise distributions are correctly
specified. However, through an extensive empirical evaluation, we demonstrate
that the accuracy deteriorates sharply when the form of the noise distribution
is misspecified by the user. Our analysis shows that the failure occurs mainly
when the conditional variance in the anti-causal direction is smaller than that
in the causal direction. As an alternative, we find that causal model selection
through residual independence testing is much more robust to noise
misspecification and misleading conditional variance.
( 2
min )
Cohen et al. (2021) empirically study the evolution of the largest eigenvalue
of the loss Hessian, also known as sharpness, along the gradient descent (GD)
trajectory and observe the Edge of Stability (EoS) phenomenon. The sharpness
increases at the early phase of training (referred to as progressive
sharpening), and eventually saturates close to the threshold of $2 /
\text{(step size)}$. In this paper, we start by demonstrating through empirical
studies that when the EoS phenomenon occurs, different GD trajectories (after a
proper reparameterization) align on a specific bifurcation diagram independent
of initialization. We then rigorously prove this trajectory alignment
phenomenon for a two-layer fully-connected linear network and a single-neuron
nonlinear network trained with a single data point. Our trajectory alignment
analysis establishes both progressive sharpening and EoS phenomena,
encompassing and extending recent findings in the literature.
( 2
min )
We study the problem of designing adaptive multi-armed bandit algorithms that
perform optimally in both the stochastic setting and the adversarial setting
simultaneously (often known as a best-of-both-world guarantee). A line of
recent works shows that when configured and analyzed properly, the
Follow-the-Regularized-Leader (FTRL) algorithm, originally designed for the
adversarial setting, can in fact optimally adapt to the stochastic setting as
well. Such results, however, critically rely on an assumption that there exists
one unique optimal arm. Recently, Ito (2021) took the first step to remove such
an undesirable uniqueness assumption for one particular FTRL algorithm with the
$\frac{1}{2}$-Tsallis entropy regularizer. In this work, we significantly
improve and generalize this result, showing that uniqueness is unnecessary for
FTRL with a broad family of regularizers and a new learning rate schedule. For
some regularizers, our regret bounds also improve upon prior results even when
uniqueness holds. We further provide an application of our results to the
decoupled exploration and exploitation problem, demonstrating that our
techniques are broadly applicable.
( 3
min )
Agglomerative hierarchical clustering based on Ordered Weighted Averaging
(OWA) operators not only generalises the single, complete, and average
linkages, but also includes intercluster distances based on a few nearest or
farthest neighbours, trimmed and winsorised means of pairwise point
similarities, amongst many others. We explore the relationships between the
famous Lance-Williams update formula and the extended OWA-based linkages with
weights generated via infinite coefficient sequences. Furthermore, we provide
some conditions for the weight generators to guarantee the resulting
dendrograms to be free from unaesthetic inversions.
( 2
min )
We describe a new direct method to estimate bipartite mutual information of a
classical spin system based on Monte Carlo sampling enhanced by autoregressive
neural networks. It allows studying arbitrary geometries of subsystems and can
be generalized to classical field theories. We demonstrate it on the Ising
model for four partitionings, including a multiply-connected even-odd division.
We show that the area law is satisfied for temperatures away from the critical
temperature: the constant term is universal, whereas the proportionality
coefficient is different for the even-odd partitioning.
( 2
min )
Graph generative model evaluation necessitates understanding differences
between graphs on the distributional level. This entails being able to harness
salient attributes of graphs in an efficient manner. Curvature constitutes one
such property that has recently proved its utility in characterising graphs.
Its expressive properties, stability, and practical utility in model evaluation
remain largely unexplored, however. We combine graph curvature descriptors with
emerging methods from topological data analysis to obtain robust, expressive
descriptors for evaluating graph generative models.
( 2
min )
Generative diffusion models have achieved spectacular performance in many
areas of generative modeling. While the fundamental ideas behind these models
come from non-equilibrium physics, in this paper we show that many aspects of
these models can be understood using the tools of equilibrium statistical
mechanics. Using this reformulation, we show that generative diffusion models
undergo second-order phase transitions corresponding to symmetry breaking
phenomena. We argue that this lead to a form of instability that lies at the
heart of their generative capabilities and that can be described by a set of
mean field critical exponents. We conclude by analyzing recent work connecting
diffusion models and associative memory networks in view of the thermodynamic
formulations.
( 2
min )
Flexible models for probability distributions are an essential ingredient in
many machine learning tasks. We develop and investigate a new class of
probability distributions, which we call a Squared Neural Family (SNEFY),
formed by squaring the 2-norm of a neural network and normalising it with
respect to a base measure. Following the reasoning similar to the well
established connections between infinitely wide neural networks and Gaussian
processes, we show that SNEFYs admit closed form normalising constants in many
cases of interest, thereby resulting in flexible yet fully tractable density
models. SNEFYs strictly generalise classical exponential families, are closed
under conditioning, and have tractable marginal distributions. Their utility is
illustrated on a variety of density estimation, conditional density estimation,
and density estimation with missing data tasks.
( 2
min )
Neural additive models (NAMs) can improve the interpretability of deep neural
networks by handling input features in separate additive sub-networks. However,
they lack inherent mechanisms that provide calibrated uncertainties and enable
selection of relevant features and interactions. Approaching NAMs from a
Bayesian perspective, we enhance them in three primary ways, namely by a)
providing credible intervals for the individual additive sub-networks; b)
estimating the marginal likelihood to perform an implicit selection of features
via an empirical Bayes procedure; and c) enabling a ranking of feature pairs as
candidates for second-order interaction in fine-tuned models. In particular, we
develop Laplace-approximated NAMs (LA-NAMs), which show improved empirical
performance on tabular datasets and challenging real-world medical tasks.
( 2
min )
Stein thinning is a promising algorithm proposed by (Riabiz et al., 2022) for
post-processing outputs of Markov chain Monte Carlo (MCMC). The main principle
is to greedily minimize the kernelized Stein discrepancy (KSD), which only
requires the gradient of the log-target distribution, and is thus well-suited
for Bayesian inference. The main advantages of Stein thinning are the automatic
remove of the burn-in period, the correction of the bias introduced by recent
MCMC algorithms, and the asymptotic properties of convergence towards the
target distribution. Nevertheless, Stein thinning suffers from several
empirical pathologies, which may result in poor approximations, as observed in
the literature. In this article, we conduct a theoretical analysis of these
pathologies, to clearly identify the mechanisms at stake, and suggest improved
strategies. Then, we introduce the regularized Stein thinning algorithm to
alleviate the identified pathologies. Finally, theoretical guarantees and
extensive experiments show the high efficiency of the proposed algorithm. An
implementation of regularized Stein thinning as the kernax library in python
and JAX is available at https://gitlab.com/drti/kernax.
( 3
min )
The out-of-sample error (OO) is the main quantity of interest in risk
estimation and model selection. Leave-one-out cross validation (LO) offers a
(nearly) distribution-free yet computationally demanding approach to estimate
OO. Recent theoretical work showed that approximate leave-one-out cross
validation (ALO) is a computationally efficient and statistically reliable
estimate of LO (and OO) for generalized linear models with differentiable
regularizers. For problems involving non-differentiable regularizers, despite
significant empirical evidence, the theoretical understanding of ALO's error
remains unknown. In this paper, we present a novel theory for a wide class of
problems in the generalized linear model family with non-differentiable
regularizers. We bound the error |ALO - LO| in terms of intuitive metrics such
as the size of leave-i-out perturbations in active sets, sample size n, number
of features p and regularization parameters. As a consequence, for the
$\ell_1$-regularized problems, we show that |ALO - LO| goes to zero as p goes
to infinity while n/p and SNR are fixed and bounded.
( 2
min )
Many real-world domains require safe decision making in uncertain
environments. In this work, we introduce a deep reinforcement learning
framework for approaching this important problem. We consider a distribution
over transition models, and apply a risk-averse perspective towards model
uncertainty through the use of coherent distortion risk measures. We provide
robustness guarantees for this framework by showing it is equivalent to a
specific class of distributionally robust safe reinforcement learning problems.
Unlike existing approaches to robustness in deep reinforcement learning,
however, our formulation does not involve minimax optimization. This leads to
an efficient, model-free implementation of our approach that only requires
standard data collection from a single training environment. In experiments on
continuous control tasks with safety constraints, we demonstrate that our
framework produces robust performance and safety at deployment time across a
range of perturbed test environments.
( 2
min )
We study the bias of Stochastic Gradient Descent (SGD) to learn low-rank
weight matrices when training deep neural networks. Our results show that
training neural networks with mini-batch SGD and weight decay causes a bias
towards rank minimization over the weight matrices. Specifically, we show, both
theoretically and empirically, that this bias is more pronounced when using
smaller batch sizes, higher learning rates, or increased weight decay.
Additionally, we predict and observe empirically that weight decay is necessary
to achieve this bias. Unlike previous literature, our analysis does not rely on
assumptions about the data, convergence, or optimality of the weight matrices
and applies to a wide range of neural network architectures of any width or
depth. Finally, we empirically investigate the connection between this bias and
generalization, finding that it has a marginal effect on generalization.
( 2
min )
Generative artificial intelligence is transforming how enterprises do business. Organizations are using AI to improve data-driven decisions, enhance omnichannel experiences, and drive next-generation product development. Enterprises are using generative AI specifically to power their marketing efforts through emails, push notifications, and other outbound communication channels. Gartner predicts that “by 2025, 30% of outbound marketing messages […]
( 8
min )
Visualization is vital for understanding complex data, but existing tools require “tidy data,” adding extra steps. Learn how Data Formulator transforms concepts into visuals, promoting collaboration between analysts and AI agents.
The post Data Formulator: A concept-driven, AI-powered approach to data visualization appeared first on Microsoft Research.
( 10
min )
AI Weirdness: the strange side of machine learning
( 2
min )
Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra helps you easily aggregate content from a variety of content repositories into a centralized index that lets you quickly search all your enterprise data and find the most accurate answer. Drupal is a content management software. It’s used to make many […]
( 7
min )
This is a guest post by Jose Benitez, Founder and Director of AI and Mattias Ponchon, Head of Infrastructure at Intuitivo. Intuitivo, a pioneer in retail innovation, is revolutionizing shopping with its cloud-based AI and machine learning (AI/ML) transactional processing system. This groundbreaking technology enables us to operate millions of autonomous points of purchase (A-POPs) […]
( 8
min )
Enterprises seek to harness the potential of Machine Learning (ML) to solve complex problems and improve outcomes. Until recently, building and deploying ML models required deep levels of technical and coding skills, including tuning ML models and maintaining operational pipelines. Since its introduction in 2021, Amazon SageMaker Canvas has enabled business analysts to build, deploy, […]
( 8
min )
Researchers are taking deep learning for a deep dive, literally. The Woods Hole Oceanographic Institution (WHOI) Autonomous Robotics and Perception Laboratory (WARPLab) and MIT are developing a robot for studying coral reefs and their ecosystems. The WARPLab autonomous underwater vehicle (AUV), enabled by an NVIDIA Jetson Orin NX module, is an effort from the world’s Read article >
( 8
min )
The cloud is full of treats this GFN Thursday with Cities: Skylines II now streaming, leading 15 newly supported games this week. The game’s publisher, Paradox Interactive, is offering GeForce NOW one-month Priority memberships for those who pick up the game first, so make sure to grab one before they’re gone. Among the newly supported Read article >
( 7
min )
This research paper was presented at the 29th ACM Symposium on Operating Systems Principles (opens in new tab) (SOSP 2023), the premier forum for the theory and practice of computer systems software. For millennia, data has woven itself into every facet of our lives, from business and academia to personal spheres. Our production of data […]
The post Project Silica: Sustainable cloud archival storage in glass appeared first on Microsoft Research.
( 10
min )
Methane (CH4) is a major anthropogenic greenhouse gas that‘s a by-product of oil and gas extraction, coal mining, large-scale animal farming, and waste disposal, among other sources. The global warming potential of CH4 is 86 times that of CO2 and the Intergovernmental Panel on Climate Change (IPCC) estimates that methane is responsible for 30 percent of observed […]
( 12
min )
In this issue: Kosmos-2.5: A Multimodal Literate Model; Can vine copulas explain complex relationships of weather variables; New system accelerates the adaptive training process; Structural inequalities and relational labor in the influencer industry.
The post Research Focus: Week of October 23, 2023 appeared first on Microsoft Research.
( 10
min )
NVIDIA researchers are collaborating with academic centers worldwide to advance generative AI, robotics and the natural sciences — and more than a dozen of these projects will be shared at NeurIPS, one of the world’s top AI conferences. Set for Dec. 10-16 in New Orleans, NeurIPS brings together experts in generative AI, machine learning, computer Read article >
( 8
min )
In today’s information age, the vast volumes of data housed in countless documents present both a challenge and an opportunity for businesses. Traditional document processing methods often fall short in efficiency and accuracy, leaving room for innovation, cost-efficiency, and optimizations. Document processing has witnessed significant advancements with the advent of Intelligent Document Processing (IDP). With […]
( 20
min )
This post is co-authored by Dhurjati Brahma, Senior Systems Architect at T-Mobile US, Inc and Jim Chao, Principal Engineer/Architect at T-Mobile US, Inc and Nicholas Zellerhoff Associate Systems Architect at T-Mobile US, Inc. T-Mobile US, Inc. provides a Voicemail to Text service to its customers, which allows customers to quickly read through their voicemails and […]
( 7
min )
Written by Venkata Nori and Kshitij Gopali. Introduction As technology is evolving, most companies in the world are adopting advanced mechanisms for their daily tasks of storing/updating data, project management & tracking, incident management, version control, etc. Periodically, these companies’ business stakeholders would want to extract and analyze the data to see how the business… Read More »Seamless integration of data from unconventional source systems into Business Intelligence using data science techniques
The post Seamless integration of data from unconventional source systems into Business Intelligence using data science techniques appeared first on Data Science Central.
( 25
min )
A recent interview by Medical Device Network with GlobalData medical analyst Alexandra Murdoch shares interesting insights into cybersecurity for medical devices.
The post How data science and medical device cybersecurity cross paths to protect patients and enhance healthcare appeared first on Data Science Central.
( 22
min )
Visual effects artist Surfaced Studio returns to 'In the NVIDIA Studio' to share his real-world VFX project, created on a brand new Razer Blade 16 Mercury Edition laptop powered by GeForce RTX 4080 graphics.
( 8
min )
This post is co-authored by Anatoly Khomenko, Machine Learning Engineer, and Abdenour Bezzouh, Chief Technology Officer at Talent.com. Founded in 2011, Talent.com is one of the world’s largest sources of employment. The company combines paid job listings from their clients with public job listings into a single searchable platform. With over 30 million jobs listed […]
( 12
min )
Images such as those in Google Street View are taking on a new purpose in the hands of University of Florida Assistant Professor of Artificial Intelligence Chaofeng Wang. He’s using them, along with deep learning, in a research project to automate the evaluation of urban buildings. The project aims to help governments mitigate natural disaster Read article >
( 6
min )
The 15th Kendall Square Association annual meeting explored new and old aspects of the neighborhood.
( 9
min )
Sponsored Post Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […]
The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.
( 10
min )